Introduction
On May 18, a report titled “Understanding Artificial Intelligence Must Start with Understanding Tokens” was published by Xinhua Daily.
In early 2026, a set of data sparked heated discussions in the global AI industry. OpenRouter, the world’s largest AI model API aggregation platform, revealed that from February 9 to 15, China’s large model token usage reached 41.2 trillion, surpassing the U.S. model’s 29.4 trillion for the first time in history. This lead continued for several weeks, breaking 73 trillion by mid to late March, with four out of the top five models globally coming from China.
This data is not meant to compare quantities but to highlight a quiet revolution in the fundamental measurement unit of the AI industry—tokens, which are becoming the “kilowatt-hour” of the intelligent era. The six dimensions of models, computing power, data, applications, industry, and governance are profoundly reshaped by this measurement unit. Understanding AI in 2026 must start with understanding tokens.
Sixfold Reconstruction Brought by a Measurement Unit
The measurement unit of the industrial revolution was the “kilowatt-hour,” allowing energy to be precisely measured, priced, and transmitted across domains. The information revolution’s units were “bits” and “bandwidth,” enabling information to be packaged, transmitted, and billed. The measurement unit of the intelligent revolution is “tokens,” allowing intelligence to be segmented, measured, priced, and traded for the first time.
The popularization of the token concept and its rapid growth in usage are gradually pushing intelligence towards industrialization, marketization, and circulation.
Models
The economic value of large models is shifting from one-time training costs to long-term inference outputs. Model vendors no longer simply “sell capability” but directly “sell tokens,” with pricing based on millions of tokens for input and output becoming a global industry standard. The asset attribute of models is transitioning from “weight files” to “the ability to continuously produce tokens.”
Computing Power
The focus is shifting from “training computing power” to “inference computing power.” Training computing power is pulsed and centralized, while inference computing power is continuous and distributed, introducing new requirements for latency, energy efficiency, and geographical distribution. The collaboration of cloud, edge, and terminal computing power, specialized inference chips, silicon photonics interconnection, and computing networks are becoming the new focus of infrastructure. JPMorgan predicts that China’s inference token consumption will grow by more than two orders of magnitude by 2030 compared to 2025.
Data
Data must be processed into standardized fuel to generate power; similarly, data entering large models must be cleaned, labeled, and tokenized. In long-tail scenarios like autonomous driving, robot training, and scientific discovery, synthetic data generated through simulation has achieved large-scale application. The construction of a data factor market is entering a substantive stage, where “trainability” and “token output density”—rather than mere data scale—are becoming the new metrics for pricing data assets. This shift is significant: the valuation of data is beginning to link to its actual contribution in the token production chain, providing a more solid economic basis for the market allocation of data factors.
Applications
The focus is moving from “functional delivery” to “token consumption.” Traditional software charges based on seats or functions; today’s applications bill based on token usage and business outcomes. Intelligent agents are becoming the main consumers of tokens, with complex tasks potentially consuming hundreds of thousands or even millions of tokens. The “intelligent agent as a service” market is rapidly expanding, with performance-based billing models being implemented at scale in customer service, marketing, compliance, and programming. The essence of applications is shifting from “delivering functions” to “consuming intelligence.”
Industry
The industry is evolving from a “software industry chain” to a “token industry chain.” A new industry chain is forming around token production (models and computing power), distribution (inference networks, APIs, intelligent agent protocols), consumption (applications and intelligent agents), and measurement (evaluation benchmarks, auditing, and trust verification). The boundaries between model layers, inference service layers, intelligent agent middleware layers, and industry application layers are becoming increasingly clear, with industry-specific intelligent agents becoming mainstream investments. Model vendors, cloud vendors, chip manufacturers, green power operators, and content distribution network vendors are forming a collaborative ecosystem in the token industry chain. According to the China Academy of Information and Communications Technology, the scale of China’s core AI industry is expected to exceed 1.2 trillion yuan by 2026, with the synergistic effects of the entire industry chain becoming evident.
Governance
The governance focus is shifting from “algorithm governance” to “full-chain governance of tokens.” As the AI industry has developed, the governance objects are expanding from “algorithms and code” to the entire chain of token production, circulation, consumption, and cross-border flow: traceability of tokens, identification of synthetic content, cross-border token movement, constraints on computing power and energy consumption, and trustworthy evaluation and benchmarks—all call for new governance tools and rules. The year 2026 may become a key year for the concentrated implementation of global AI governance rules.
China’s Position in the Global Token Wave
In the global wave brought by tokens, China is forming a unique position supported by multiple factors.
On the production side, domestic models are rising in clusters. A number of domestic models, such as MiniMax, Dark Side of the Moon, Deep Quest, Zhipu, Alibaba Qianwen, and ByteDance Doubao, are leveraging mixed expert architectures and extreme engineering optimizations to continuously improve performance while reducing inference costs to a fraction of comparable global models. On the OpenRouter platform, U.S. users account for 47%, while Chinese users make up only about 6%, yet the usage volume is led by Chinese models—this is a recognition determined by global developers voting with their feet.
On the consumption side, applications are unprecedentedly deepening, and tokens are entering daily life at an unprecedented speed. A general practitioner in a county hospital, faced with a suspicious lung CT, can have AI circle nodules and provide differential diagnosis suggestions in just a few seconds and thousands of tokens, compressing what used to take two weeks into a single consultation. A farmer in Shouguang, Shandong, can take a photo of a curled cucumber, and a smart agriculture app uses tokenized agricultural knowledge to inform him whether it’s a thrips or viral disease and which medication to use. An elderly person living alone can tell a smart speaker in dialect, “I feel tight in my chest,” and after a few thousand tokens of dialogue, their children’s phones receive alerts and location sharing for emergency services. Delivery riders now hear navigation instructions that are not just mechanical but are planned based on real-time traffic and elevator wait times. AI assistants in government service halls are available 24/7 to answer inquiries about medical insurance transfers and property registration, replacing “people running errands” with “tokens running errands.” Tokens are becoming the “invisible labor force” across various industries.
At the industry chain level, a full-stack collaborative ecosystem is rapidly taking shape. From domestic chips like Ascend, Cambricon, and Haiguang to inference service platforms like Volcano Engine, Alibaba Cloud, and Tencent Cloud, along with a range of open-source middleware and industry-specific intelligent agents, the entire industry chain covering chips, computing power, models, middleware, and applications is quickly being perfected. The “East Data West Computing” project provides low-cost computing power, and green electricity directly supplied to data centers solidifies the energy foundation.
However, it is essential to recognize that there is still significant room for improvement in areas such as the originality of cutting-edge models, high-end computing power foundations, cross-language and cross-cultural ecological influence, and participation in global rule-making.
The second half of the token wave is not about “having already won” but rather “just beginning.” In the global landscape unfolding from small tokens, China is not only a massive market but also a proactive builder and responsible co-governor. Understanding tokens is key to understanding the next phase of artificial intelligence.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.