Meituan Trains a 1.6T-Parameter MoE Model on 50,000 Chinese Chips and Open-Sources It
A trillion-parameter model trained end-to-end on non-NVIDIA hardware changes the supply-chain calculus for frontier AI. When the weights, training stack, and inference engine are all open-sourced at a price floor below GPT-4 or Claude, the cost of building code-native agents drops sharply.
Meituan's LongCat-2.0 is a 1.6-trillion-parameter Mixture-of-Experts model trained from scratch on 50,000 domestic AI accelerators, a first for a model of this scale. It activates an average of 48 billion parameters per token and uses a zero-computation expert mechanism that dynamically allocates resources, making simple queries cheap and complex code or long-document analysis computationally feasible. The model natively handles a 1-million-token context window, enough to ingest an entire codebase in one pass.
Before its official release, a preview version was dropped anonymously onto OpenRouter, where it climbed to the top three in global call volume within two months. It hit number one on the Hermes agent platform and number two on Claude Code, signaling strong real-world demand for a model that prioritizes autonomous coding and tool use over chat.
Meituan is pairing the launch with a full open-source release of the training framework, a domestic-hardware inference engine, and the model weights themselves. API pricing undercuts comparable Western models, with cache-hit input at 0.04 yuan per million tokens.
Training a 1.6T model entirely on domestic accelerators without NVIDIA hardware is an existence proof that reshapes assumptions about chip-export controls and frontier-model feasibility.
Anonymously dropping a preview on OpenRouter and letting raw usage data speak is a hard-nosed go-to-market tactic that sidesteps benchmark gaming.
The zero-computation expert mechanism is a practical cost lever: it makes the model cheap on easy tokens and capable on hard ones, which is what a code agent workload actually needs.
Open-sourcing the training infrastructure alongside the weights targets the bottleneck most Western open models ignore — the ability to retrain or fine-tune at scale on non-CUDA hardware.
LongCat's immediate traction on Claude Code and Hermes suggests developers are switching based on cost and context length, not just benchmark scores.