Meta Segment Anything Model Audio

Text-to-mute for real? Fans buzz, musicians swoon, skeptics say it broke on espresso

TLDR: Meta’s SAM Audio lets you type or click to pull specific sounds from a clip, with a new way to target moments. Commenters split: some cheer accessibility and music stems, while others say the demo failed on everyday noise, sparking a hype vs. reality debate.

Meta just dropped SAM Audio, a “text-to-separate” sound tool that promises you can type “only the singer” and it pulls the voice clean. The crowd is split between wow and hmm. User htrp gushed about the “super amazing” demo, then asked if you have to spell out the exact noise to zap. Accessibility champions cheered; ortusdux imagined it paired with smart glasses for the hearing impaired and people with CAPD (a listening processing disorder). Music nerd hbn begged for more song stems: isolated harmonies, hidden riffs, the whole candy shop.

Then came the crash-test: mwmisner tried isolating an espresso machine and a train in the sample video and “it seemed to fail,” cue the classic “is this a demo trick or real life?” debate. One spicy reply got [flagged], which only poured gas on the hype-vs-skeptic fire.

Meta says you can select sounds by clicking in the video, use “span prompting” to target a specific moment, and lean on an open evaluation set and judge model tied closely to human listening. Partners like Meta’s announcement highlight potential for hearing aids (Starkey) and disabled founders (2GI). The vibe: bold new mute button, but don’t toss your earplugs yet. Just yet, folks.

Key Points

•Meta introduced SAM Audio, a model for separating target and residual sounds from audio and audiovisual sources across general sound, music, and speech.
•SAM Audio enables text-based prompts, click-based selection in videos, and introduces span prompting to choose points within a timespan.
•PE-AV is released as a new open-source model that adds audio capabilities to Meta’s Perception Encoder.
•SAM Audio includes a first-of-its-kind open-source evaluation dataset and a judge model correlated with human subjective evaluation.
•Partners 2gether-International and Starkey highlighted potential applications of open models like SAM Audio in startups and hearing technology.

Hottest takes

"super amazing demo performance being able separate out music voice and background noises" — htrp

"Would be great for the hearing impaired... when combined with Meta glasses" — ortusdux

"I tried to Isolate just the espresso machine and the train... it seemed to fail" — mwmisner

December 18, 2025

Mute Reality? Or just the demo?

Text-to-mute for real? Fans buzz, musicians swoon, skeptics say it broke on espresso

TLDR: Meta’s SAM Audio lets you type or click to pull specific sounds from a clip, with a new way to target moments. Commenters split: some cheer accessibility and music stems, while others say the demo failed on everyday noise, sparking a hype vs. reality debate.

Key Points

Hottest takes

December 18, 2025

Mute Reality? Or just the demo?

Meta Segment Anything Model Audio

Text-to-mute for real? Fans buzz, musicians swoon, skeptics say it broke on espresso

TLDR: Meta’s SAM Audio lets you type or click to pull specific sounds from a clip, with a new way to target moments. Commenters split: some cheer accessibility and music stems, while others say the demo failed on everyday noise, sparking a hype vs. reality debate.

Key Points

Hottest takes

Save News