We gave 5 LLMs $100K to trade stocks for 8 months

AI bots turned $100K into ‘paper profits’ — Grok tops, traders roast the backtest

TLDR: Backtested “AI traders” ran $100K each for eight months, with Grok on top and Gemini last, mostly by loading up on big tech. The comments roasted it as paper-trading theater, demanding real money tests and better benchmarks like SPXL, while others noted the bots just moved with the market

Five chatty AIs were handed $100K each to “trade” for eight months in a simulated stock market — and the internet immediately yelled “fake money, fake flex.” The project, AI Trade Arena, backtested from Feb to Oct 2025 with time-filtered news and prices to avoid spoilers. Results: Grok came out on top, DeepSeek close behind, and Gemini dragged last after going light on tech while everyone else piled into the usual Silicon Valley darlings.

But the comments? Pure cage match. Skeptics slammed the whole thing as paper-trading cosplay, arguing backtests aren’t real life and don’t include market impact, slippage, or panic — “Stopped reading after ‘paper money,’” sneered a self-described quant. Another user dared the researchers to put real cash on the line. One link to the nof1.ai leaderboard poured gasoline on the thread: live results elsewhere look meh, with AIs day-trading the “Magnificent Seven” tech stocks and “losing money with gusto.”

Not everyone went full doom. A few asked for better benchmarks — “Compare it to a 3x S&P ETF like SPXL,” one suggested — and noted the bots seemed to move with the overall market. But the meme of the day was brutal: AI invents a time machine, discovers ‘buy tech.’

Key Points

  • Five LLMs (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro, Grok 4, DeepSeek) each traded $100K in paper money over an eight-month backtest.
  • The simulation ran from February 3 to October 20, 2025, with daily trades in major stocks and no options.
  • Data sources were time-segmented to prevent future leakage and tests began after each model’s training cutoff dates.
  • Grok achieved the best performance; DeepSeek was close behind, while Gemini placed last due to a larger non-tech allocation.
  • The team outlined a next-step plan: expand backtests, run live paper trading, and move toward real-world trading to isolate performance factors.

Hottest takes

"Stopped reading after ‘paper money’" — chroma205
"If you’re convinced, put real money and see what happens" — deadbabe
"Would like to see a 3X S&P 500 ETF like SPXL charted" — chongli
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.