February 16, 2026
Oh-See-Rage in the comments
Rolling your own serverless OCR in 40 lines of code
DIY “serverless” OCR sparks a brawl: cloud hack, old GPUs, and “is this even legal” vibes
TLDR: A dev built a cheap, on-demand cloud setup to run DeepSeek’s OCR model and make books searchable without buying new hardware. The thread explodes over what “serverless” really means, legality of mass data generation, and whether newer OCR models beat DeepSeek—plus a Tesseract vs AI dogfight.
An engineer shows how to turn book pages into searchable text with a “serverless” setup: run the open DeepSeek-OCR model on an on‑demand GPU via Modal, pay by the second, and batch pages for speed. It’s clever, cheap‑ish, and avoids buying new hardware. But the comments? Absolute chaos.
First punch: the word “serverless.” One user quips it should mean “run it on your own laptop,” not “spin up a cloud server.” Cue the classic meme: serverless = someone else’s computer. Others pile on asking how this stacks up against old‑faithful Tesseract; the thread quickly turns into a bake‑off between “simple and free” vs “fancy AI that needs a pricey graphics chip.”
Then the legal siren goes off. A quote from DeepSeek’s paper—“200k+ pages per day” to generate training data—has folks side‑eyeing the legality. One commenter deadpans: “That… doesn’t sound legal.” Meanwhile, infrastructure nerds clutch their Dockerfiles asking if autoscaling pods keep re‑downloading models from Hugging Face, advocating to bake models into the image for faster cold starts.
Finally, the hot take scorches: DeepSeek isn’t even top dog anymore. Commenters point to the OCR Arena leaderboard and shout out newer contenders like dots and olmOCR. Between cost debates, legality jitters, and leaderboard flexing, the code is neat—but the comments are the real show.
Key Points
- •The article builds a serverless OCR service using Modal to run DeepSeek-OCR on a cloud GPU (A100).
- •A custom container image based on NVIDIA CUDA 11.8 with Python 3.11 installs torch 2.6.0, torchvision 0.21.0, transformers 4.46.3, PyMuPDF, Pillow, and NumPy.
- •Modal’s decorators (@app.function and @modal.asgi_app) provision GPUs, build containers, and route HTTP requests without server management.
- •The FastAPI app loads the DeepSeek-OCR model and tokenizer once per container, moves the model to CUDA in bfloat16 eval mode, and reuses it across requests.
- •An /ocr_batch endpoint performs batched inference on multiple images to speed up OCR throughput compared to single-page processing.