AI Programming

Meituan Trains a 1.6T-Parameter MoE Model on 50,000 Chinese Chips and Open-Sources It

By ZJPRENO · Jul 1, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

A trillion-parameter model trained end-to-end on non-NVIDIA hardware changes the supply-chain calculus for frontier AI. When the weights, training stack, and inference engine are all open-sourced at a price floor below GPT-4 or Claude, the cost of building code-native agents drops sharply.

Summary

Meituan's LongCat-2.0 is a 1.6-trillion-parameter Mixture-of-Experts model trained from scratch on 50,000 domestic AI accelerators, a first for a model of this scale. It activates an average of 48 billion parameters per token and uses a zero-computation expert mechanism that dynamically allocates resources, making simple queries cheap and complex code or long-document analysis computationally feasible. The model natively handles a 1-million-token context window, enough to ingest an entire codebase in one pass.

Before its official release, a preview version was dropped anonymously onto OpenRouter, where it climbed to the top three in global call volume within two months. It hit number one on the Hermes agent platform and number two on Claude Code, signaling strong real-world demand for a model that prioritizes autonomous coding and tool use over chat.

Meituan is pairing the launch with a full open-source release of the training framework, a domestic-hardware inference engine, and the model weights themselves. API pricing undercuts comparable Western models, with cache-hit input at 0.04 yuan per million tokens.

Takeaways

— LongCat-2.0 totals 1.6 trillion parameters under a Mixture-of-Experts architecture, activating an average of 48 billion parameters per token.

— A self-developed zero-computation expert mechanism adjusts per-token compute: simple text gets fewer resources, complex code or long context gets more.

— The model was fully pre-trained and runs inference on a cluster of 50,000 domestic AI accelerators, with no overseas GPUs used.

— Native context window reaches 1 million tokens, enabling single-pass processing of entire code repositories and million-word documents.

— Pre-training data exceeds 30 trillion tokens, including multilingual text, open-source code, tool-use logs, and agent interaction data.

— Preview version ranked in the global top three by call volume on OpenRouter, hitting #1 on Hermes and #2 on Claude Code agent platforms.

— Cache-hit API pricing is 0.04 yuan per million input tokens; output costs about 8 yuan per million tokens.

— Meituan is open-sourcing the distributed training framework, the domestic-hardware inference engine, and the full model weights.

Conclusions

Training a 1.6T model entirely on domestic accelerators without NVIDIA hardware is an existence proof that reshapes assumptions about chip-export controls and frontier-model feasibility.

Anonymously dropping a preview on OpenRouter and letting raw usage data speak is a hard-nosed go-to-market tactic that sidesteps benchmark gaming.

The zero-computation expert mechanism is a practical cost lever: it makes the model cheap on easy tokens and capable on hard ones, which is what a code agent workload actually needs.

Open-sourcing the training infrastructure alongside the weights targets the bottleneck most Western open models ignore — the ability to retrain or fine-tune at scale on non-CUDA hardware.

LongCat's immediate traction on Claude Code and Hermes suggests developers are switching based on cost and context length, not just benchmark scores.

Concepts & terms

Mixture-of-Experts (MoE)

A model architecture where only a subset of parameters (experts) are activated for each input token, reducing computation compared to a dense model of equivalent total size.

Zero-Computation Expert Mechanism

A dynamic routing strategy that allocates fewer computational resources to simple tokens and more to complex ones, lowering average inference cost without degrading performance on hard tasks.

1M-Token Context Window

The model can process up to one million tokens in a single forward pass, roughly equivalent to three full-length novels or a large software repository, without truncating or forgetting earlier content.

NPU Deterministic Computation

Ensuring that neural processing units produce identical outputs for identical inputs across a distributed cluster, a hard requirement for fault-tolerant training at scale on non-GPU hardware.

ScMoE

A cross-layer shortcut connection design for MoE models that improves information flow between layers, specifically tuned for code generation and long-context reasoning.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗