LM Studio 0.4.0

Local AI goes pro: no app needed, faster chats, and a spicy CLI — commenters pick sides

TLDR: LM Studio 0.4.0 breaks out its core as “llmster” you can run anywhere, adds a command line, faster parallel chats, and a stateful chat API. Comments split between hype for local privacy/control and skeptics who say cloud models are better, with a side of Ollama rivalry drama

LM Studio 0.4.0 just dropped and the comments went supernova. Fans are buzzing because the team split the app from its brain — meet “llmster,” a server-style core you can run anywhere — and added a throwback command line chat. One commenter, jiqiren, rattled off the new goods: faster multi-request replies (one model answering several people at once), a stateful chat API for apps, and a slicker UI with Split View and chat exports. Check the details on LM Studio.

That CLI sparked major nostalgia. “Makes things come full circle,” sighed minimaxir, while power users cheered the no-GUI option for cloud rigs and home GPUs. Then the rivalry drama hit: syntaxing called it “what I want from Ollama,” blasting Ollama as slow and off-mission — cue Local AI turf war memes.

Not everyone’s sold. Skeptics like saberience asked what normal folks get from local models if paid cloud ones are better — is it privacy, control, or just, ahem, “adult” chats? The thread split between privacy diehards and pragmatists who just want the best answers, wherever they live. A tiny hiccup popped up when anonym29 thought Developer Mode vanished — then edited to say it was just a settings mix-up. Meanwhile, devs drooled over parallel requests and the upgraded llama.cpp 2.0 under the hood. Verdict: LM Studio just leveled up from comfy desktop app to DIY mini-cloud, and the competition felt it

Key Points

  • LM Studio 0.4.0 introduces llmster, a server-native core enabling non-GUI deployments and standalone daemon operation.
  • Parallel requests to the same model are supported via llama.cpp 2.0.0 and continuous batching, with new options for Max Concurrent Predictions and Unified KV Cache.
  • A new stateful REST endpoint (/v1/chat) enables conversation continuity using response_id and previous_response_id, with detailed response metrics.
  • The UI is refreshed with chat export (PDF/markdown/text), Split View for side-by-side sessions, Developer Mode, and in-app documentation.
  • A new CLI experience centers on the lms chat command, alongside install scripts and runtime update commands for llama.cpp and MLX.

Hottest takes

"LMStudio introducing a command line interface makes things come full circle" — minimaxir
"It’s essentially what I want from ollama… Ollama has deviated so much" — syntaxing
"Is this just for privacy conscious people? Or is this just for ‘adult’ chats" — saberience
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.