February 6, 2026
Monkeys, models, and mayhem
Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements
Commenters cry "monkeys with darts" as skeptics grill AI stock picker
TLDR: BioTradingArena lets chatbots read biotech press releases and predict stock reactions, but commenters pounced: markets are efficient, biotech is unpredictable, and models may “know” historical outcomes. The mood is playful yet pointed—fun benchmark for research, not a magic money machine—highlighting the high stakes of AI-meets-finance experiments.
A new “Show HN” project, BioTradingArena, dares large language models (LLMs) to read biotech press releases and call the stock move in plain-English buckets from very_positive to very_negative. It’s a neat testbed with an API and editable prompts—but the comments turned into a cage match. The top vibe? Skepticism with a side of popcorn.
One camp went full reality check. “Markets are efficient,” warned a veteran, dropping the now-iconic line about monkeys throwing darts—translation: bots won’t outsmart Wall Street just by pattern-spotting headlines. Another voice said biotech is chaos wrapped in lab coats: science timelines are messy, and sentiment can be a misleading signal. Then came the plot twist: a former fund engineer admitted their multi-data, all-in approach “didn’t make a dent” in predicting winners. Ouch.
The nerdiest drama centered on data leakage—as in, LLMs might “know the future outcome” if trained on the very events they’re scored against. That’s like letting the quiz show contestant see the answer key. Meanwhile, a lone bright spot championed human expertise: one analyst built a business mapping regulatory mazes and tracking approval rhythms—think old-school edge, not sci-fi Maestro Database.
Between quants vs. vibes, “stonks vs. science” jokes, and dart-throwing monkey memes, the crowd’s verdict is loud: cool benchmark, fun science project—but don’t bet the lab on it.
Key Points
- •BioTradingArena provides a benchmark for LLMs to predict biotech stock impacts from press releases, focusing on an oncology dataset.
- •Users can configure strategies, edit prompts, and access an API to run benchmarks and create custom approaches.
- •The task is a single-prompt classification into seven categories, each tied to approximate stock move ranges.
- •Guidance emphasizes conservative classification, reserving extreme labels for clearly exceptional or catastrophic news.
- •Outputs must follow a strict JSON schema including predicted impact, score, confidence, reasoning, and highlights.