November 26, 2025
File this under chaos
Compressed filesystems à la language models
AI runs a pretend hard drive; devs swoon, skeptics warn of GPU bills
TLDR: A developer trained an AI to mimic a filesystem, hitting strong test accuracy and sparking big reactions. Nostalgia and builder excitement collided with warnings about GPU costs, short memory limits, and text-only support—making this a flashy proof-of-concept that has the community split on real-world usefulness.
A coder trained a small AI to act like a “pretend hard drive,” answering file reads and writes like a chatty librarian, and the crowd went off. Old-school fans cheered the vibe, with one user quoting the line that every engineer secretly wants to build a filesystem and dropping a nostalgic tale of TRS‑80 storage woes. Builders got hyped too: one commenter said this was exactly the weekend project they were about to start, turning the thread into a virtual hackathon. But the buzz wasn’t all rosy. The top skeptic listed brutal caveats: you need a big AI model, probably a pricey graphics card (GPU), the model’s short memory window means it forgets things fast, and it only handles text. Translation: cool demo, questionable practicality. Meanwhile, nerd culture cameos arrived—someone linked Fabrice Bellard’s experimental ts_zip as a spiritual predecessor, and a glorious “mgddbsbdbd ddfk,d ,” comment became the meme of the day (“the filesystem compressed their sentence”). For context: the author trained on simulated file actions (via FUSE, a way to build pretend filesystems in userspace) with neat XML snapshots, hitting ~98% test accuracy on a small Qwen model. The vibe? Hackers dream, skeptics simmer, memes flourish.
Key Points
- •A loopback FUSE filesystem with logging was built to generate reference training data.
- •A simulator produced diverse prompt–completion pairs, capturing minimal read/write operations and full filesystem state.
- •XML was selected to represent the filesystem tree in prompts for clarity and parsing reliability.
- •The model (Qwen3-4b) was fine-tuned via SFT on Modal, achieving ~98% accuracy on a hold-out eval after 8 epochs and ~15,000 examples.
- •A minimal FUSE filesystem was implemented where every operation is delegated to the LLM for responses.