Compressed filesystems à la language models

AI runs a pretend hard drive; devs swoon, skeptics warn of GPU bills

TLDR: A developer trained an AI to mimic a filesystem, hitting strong test accuracy and sparking big reactions. Nostalgia and builder excitement collided with warnings about GPU costs, short memory limits, and text-only support—making this a flashy proof-of-concept that has the community split on real-world usefulness.

A coder trained a small AI to act like a “pretend hard drive,” answering file reads and writes like a chatty librarian, and the crowd went off. Old-school fans cheered the vibe, with one user quoting the line that every engineer secretly wants to build a filesystem and dropping a nostalgic tale of TRS‑80 storage woes. Builders got hyped too: one commenter said this was exactly the weekend project they were about to start, turning the thread into a virtual hackathon. But the buzz wasn’t all rosy. The top skeptic listed brutal caveats: you need a big AI model, probably a pricey graphics card (GPU), the model’s short memory window means it forgets things fast, and it only handles text. Translation: cool demo, questionable practicality. Meanwhile, nerd culture cameos arrived—someone linked Fabrice Bellard’s experimental ts_zip as a spiritual predecessor, and a glorious “mgddbsbdbd ddfk,d ,” comment became the meme of the day (“the filesystem compressed their sentence”). For context: the author trained on simulated file actions (via FUSE, a way to build pretend filesystems in userspace) with neat XML snapshots, hitting ~98% test accuracy on a small Qwen model. The vibe? Hackers dream, skeptics simmer, memes flourish.

Key Points

  • A loopback FUSE filesystem with logging was built to generate reference training data.
  • A simulator produced diverse prompt–completion pairs, capturing minimal read/write operations and full filesystem state.
  • XML was selected to represent the filesystem tree in prompts for clarity and parsing reliability.
  • The model (Qwen3-4b) was fine-tuned via SFT on Modal, achieving ~98% accuracy on a hold-out eval after 8 epochs and ~15,000 examples.
  • A minimal FUSE filesystem was implemented where every operation is delegated to the LLM for responses.

Hottest takes

"Every systems engineer at some point in their journey yearns to write a filesystem" — PaulHoule
"this was the first step that i was gonna work on this weekend" — endofreach
"you need an LLM, likely a GPU, all your data is in the context window (which we know scales poorly), and this only works on text data." — N_Lens
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.