March 10, 2026
From Street View to Street Spy?
LoGeR – 3D reconstruction from extremely long videos (DeepMind, UC Berkeley)
New AI can rebuild whole streets from video — fans see sci‑fi, critics see Big Brother
TLDR: A new AI called LoGeR can turn long, ordinary videos into detailed 3D maps of entire streets, no special hardware needed. The community is split between gushing over sci‑fi level tech and warning it’s basically a future tool for mass surveillance and ultra-precise corporate mapping.
DeepMind and UC Berkeley just dropped LoGeR, an AI that can turn really long, shaky everyday videos into detailed 3D maps of entire streets — no fancy lasers, just your camera. The tech world went, “Wow,” and the comments section went, “Wait… should we be terrified?”
On one side, users are hyped. One commenter imagines Google Street View using this to create insanely detailed 3D cities, calling it a “wonderful time” for video‑to‑3D and cheering that every month “a new brick is put in place.” Another says it feels straight out of Cyberpunk 2077, comparing it to those high-tech “braindance” crime-scene replays where you fly through frozen memories.
But then the paranoia hits. The top skeptical voice basically asks: are these researchers blind to the fact this screams mass surveillance? Others question whether this is just an over-engineered way to copy what car-mounted laser scanners (lidar) already do, warning that AI “guesswork” could hallucinate fake details into the world. And of course, there’s drama about the code not being properly released yet — one commenter shrugs that it’s probably unusable “research code” unless you’re a wizard. In classic internet fashion, the community is split: half ready to jack into the 3D matrix, half convinced it’s just building a prettier cage.
Key Points
- •LoGeR introduces a chunk-based hybrid architecture that decouples short-range alignment from long-range global anchoring for long-context 3D reconstruction.
- •The method uses causal chunk-wise processing with a Hybrid Memory Module that maintains fast weights via an apply-then-update procedure.
- •It addresses two barriers to long-context scaling: an architectural context wall and a training data wall.
- •On KITTI, LoGeR achieves an average ATE of 18.65; on a 19k-frame VBR dataset, it shows a 30.8% relative improvement over prior feedforward methods.
- •LoGeR remains competitive on short sequences, achieving state-of-the-art accuracy while running significantly faster than full-attention baselines like VGGT.