LingBot-Map: Streaming 3D reconstruction with geometric context transformer

LingBot‑Map promises live 3D maps — but users want receipts

TLDR: LingBot‑Map shows live 3D mapping with small memory and ~20 FPS, plus demos and open code. Commenters immediately pressed for hardware details and head‑to‑head benchmarks—especially versus Depth Anything 3—turning the launch into a “cool tech, but prove it” moment that matters for anyone chasing real‑time AI mapping.

Robbyant just dropped LingBot‑Map, a tool that turns video into a live 3D map while you move. The team says it keeps memory tiny by storing just the “important bits” — an anchor for scale, a small local window, and a compressed trail of past frames — and still runs at about ~20 frames per second at 518×378. There’s a slick demo, a tech report, and open code. On paper, it’s “real‑time 3D without the meltdown,” and the internet took notice.

Then the comments lit up. The top vibe: show us the hardware. User avaer zoomed in on the headline stat — “~20 FPS” — and basically said, cool number, but what GPU is doing the heavy lifting? Is this laptop‑friendly, or a secret spaceship under the desk? Another user, squanch, cut to the chase: how does this stack up against Depth Anything 3 in streaming mode — the popular go‑to for fast depth? Cue the classic memes: “post your rig or it didn’t happen,” and “benchmarks, not vibes.” Fans are impressed by the constant‑memory wizardry; skeptics say a real‑time claim without clear specs is just a teaser. Verdict so far: hot demo, hotter debate — and everyone wants head‑to‑head charts, yesterday.

Key Points

  • Robbyant introduces LingBot-Map, a streaming 3D reconstruction system.
  • Core method is Geometric Context Attention (GCA) with anchor, pose‑reference window, and trajectory memory.
  • Per‑frame memory and compute are kept nearly constant on 10,000+ frame sequences at ~20 FPS.
  • Pipeline: DINO backbone, alternating Frame Attention and GCA, outputs camera pose and depth maps.
  • Resources provided: arXiv tech report, GitHub code, and model releases on Hugging Face and ModelScope; demos span indoor, aerial, and driving scenes.

Hottest takes

But on what hardware? — avaer
Any information how the performance is compared to depth anything 3 in streaming mode? — squanch
it would be nice if projects like this grounded the numbers — avaer
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.