Apple releases open-source model that instantly turns 2D photos into 3D views

Internet split: mind-blowing 3D magic or realtor-core fakery

TLDR: Apple open-sourced SHARP, a tool that converts a single photo into a 3D view in under a second. Commenters are split between wowed fans tying it to iPhone “Spatial Scenes” and skeptics calling it realtor-core fakery that blurs reality, with meta chatter noting this debate already happened.

Apple just dropped SHARP, an open-source trick that turns a single photo into a 3D scene in under a second. The demos are live — cue link-drops to examples and the paper — and the vibe is equal parts “wow” and “wait.” Fans immediately asked if it’s the same sorcery behind Apple’s new iPhone “Spatial Scenes,” with one calling it “wildly impressive.” For the non-nerds: think a photo that becomes a little world you can peek around, like moving through a cloud of colored dots that somehow looks real. Fast, flashy, and free to try — internet catnip.

Then the snark arrived. One commenter declared this the final boss of “realtor content” — slow pans of empty rooms with sad music — and warned it’s just more “abstracted reality,” another step toward not knowing what’s real. Others rolled their eyes at the repost drama, linking to the HN thread from 11 days ago like, “We’ve been here.” The community split hard: the “this changes everything” crowd vs. the “who asked for Zillow-core deepfakes?” brigade. Jokes flew, hot takes simmered, and between the awe and the angst, one truth emerged: turning any photo into a tiny 3D world is cool — and a little scary.

Key Points

  • SHARP converts a single image into a metric 3D Gaussian scene representation via a single feedforward pass in under a second on a standard GPU.
  • The representation enables real-time, high-resolution rendering of nearby views with absolute scale and supports metric camera movements.
  • Experiments show robust zero-shot generalization and state-of-the-art performance, reducing LPIPS by 25–34% and DISTS by 21–43% versus prior models, with synthesis time reduced by three orders of magnitude.
  • The project provides a Python CLI (sharp) to predict 3D Gaussian splats (.ply), auto-downloads model checkpoints, and supports manual checkpoint specification.
  • Video rendering of camera trajectories currently requires a CUDA GPU; outputs use OpenCV coordinates and are compatible with public 3DGS renderers.

Hottest takes

"Easier for real estate agents to show slow panning around a room, with lame music" — b112
"If so, it’s been wildly impressive" — gjsman-1000
"HN discussion 11 days ago:" — bertili
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.