January 2, 2026
Alexa, grill my finals
Fighting Fire with Fire: Scalable Oral Exams
Students Face Robot Examiner: Black Mirror vibes or finally fair
TLDR: A university ran live oral finals with a voice AI to catch AI-written homework, delivering cheap, fast grading that mostly matched humans. Commenters split between calling it dehumanizing robo‑school and praising it as fair, with side debates about privacy, deepfakes, and whether AI should teach too.
A college AI class just handed final exams to a talking robot — and the internet lit up. The school used a voice bot (think a very stern Siri) to grill students in real time about their own projects and a class case. Why? Because homework now looks suspiciously like it was ghostwritten by big chatbots, and live answers are harder to fake. It was cheap, too: about $0.42 per student, 25 minutes each, and grades matched humans nearly all the time.
Cue the comment chaos. One camp cheered: if you can’t explain your own work, the bot will expose it — fairness for honest students! Another camp hissed “dehumanizing,” likening it to a call-center nightmare where your future is decided by a script. Some went full “Black Mirror,” grateful they graduated “in the before time.” Others pushed the logic to the edge: if AI can test you, why not let it teach the whole course? And with deepfake video getting good, skeptics asked what recording webcams even proves anymore.
There were spicy pragmatists, too: one hiring manager said take-home tasks always revealed who couldn’t defend their answers. Another suggested posting the entire AI exam format ahead of time — if it’s truly robust, surprise shouldn’t matter. Meanwhile, meme-makers dubbed it “Press 1 to dispute your grade,” and bean-counters called it “the cheapest TA ever.”
Key Points
- •Instructors observed that polished take-home submissions were not matched by students’ ability to explain their work, indicating LLM-enabled weaknesses in traditional assessments.
- •The course implemented a two-part oral exam administered by a Voice AI built with ElevenLabs Conversational AI to test real-time reasoning and decision defense.
- •A multi-agent workflow handled authentication, project-specific questioning (with injected context), and case-based questioning, with planned RAG and NYU SSO integration.
- •Setup was reported as fast, leveraging dynamic variables and workflows to maintain structure and prevent conversational drift.
- •Pilot metrics: 36 students over 9 days; 25-minute average exam (range 9–64); ~65 messages per conversation; ~$0.42 per student ($15 total); 89% of LLM grades within 1 point.