May 11, 2026
Pastel de nata meets chatbot chaos
Amália and the Future of European Portuguese LLMs
Portugal spent millions on an AI for its own Portuguese, and the internet is fighting
TLDR: Portugal announced a €5.5 million AI built for European Portuguese, but commenters are split over whether it’s a smart cultural investment or a costly niche project. The loudest debate is over missing public releases, limited local language data, and whether a country-specific AI can really compete.
Portugal’s shiny new AI project, AMÁLIA, was supposed to be a national pride moment: a €5.5 million state-backed chatbot meant to treat European Portuguese like a first-class language instead of an afterthought. The researchers behind it earned praise for the effort, but the comment section immediately turned into a full-blown family dinner argument. The biggest drama? Critics say the project calls itself “open” while key pieces — like the model itself, the training data, and the testing tools — are still nowhere to be found. For internet spectators, that’s less “open source hero” and more “pics or it didn’t happen.”
Then came the language war. One camp basically yelled, why spend millions teaching a small AI one country’s version of Portuguese when bigger global models already exist? User hartator went scorched-earth with “What a waste of time and money,” while others worried Portugal simply may not have enough homegrown text to make the system truly smart. Another hot take suggested a shortcut so chaotic it almost sounds like a meme: just convert Brazilian Portuguese into European Portuguese first and call it a day. Meanwhile, the anti-specialization crowd declared that country-specific or niche AIs are doomed anyway, because smaller focused models will never match broad world knowledge. Still, supporters can point to one awkward fact for the haters: despite all the skepticism, AMÁLIA reportedly beats some major rivals on Portuguese tests. So yes, the science is real — but the comments are asking whether Portugal built a cultural landmark, or just the internet’s most expensive language argument.
Key Points
- •The Portuguese government announced AMÁLIA in December 2024 with a €5.5 million investment to build a large language model for European Portuguese.
- •AMÁLIA was developed by a collaboration including NOVA, IST, IT, and FCT, and was built by continuing EuroLLM pre-training rather than training a model from scratch.
- •The project increased European Portuguese data across training stages, using Arquivo.pt in pre-training and synthetic Portuguese data in supervised fine-tuning.
- •The team created four new benchmarks for European Portuguese, with ALBA highlighted as the main benchmark.
- •According to the article, AMÁLIA used 107 billion pre-training tokens, including 5.8 billion clearly European Portuguese tokens from Arquivo.pt, and it reportedly outperformed Qwen 3-8B on most Portuguese benchmarks.