NVIDIA handed chip design over to AI, and Huawei Cloud said tokens should be industrialized—this same week, the second half of enterprise AI officially kicked off.

The matter began on June 1st.

That day, NVIDIA issued an announcement: Cadence's chip design AI super agent had begun working. Not "assisting designers in drawing," but "designing chips on its own." 7×24 hours. NVIDIA itself was the first customer—it used this AI to verify its own chip designs.

Then on June 2nd, in Beijing. At the BCS 2026 Cybersecurity Conference, a list called "China AI Agent Leaders" was released. AI agents from over 100 companies, covering everything from bank risk control to highway inspections to pig herd health monitoring, span nearly every industry you can imagine.

By June 5th, in Shanghai. Huawei Cloud CEO Zhou Yuefeng stood on the stage of the INSPIRE Innovator Conference and said a very Huawei-style phrase: The era of Agentic AI has arrived, and Tokens need to be industrially produced. The numbers behind it are: 200 EFLOPS computing power cluster, 100,000 cards, and a throughput of 5 million Tokens per thousand cards per second.

Within five days, three events. But when you look at them together, you'll find they're actually about the same thing: AI is evolving from a "tool that needs someone watching it" into a "digital colleague that works on its own without supervision," and the underlying logic of enterprise IT infrastructure is undergoing a complete transformation as well.

NVIDIA didn't talk about graphics cards this time; it talked about "digital colleagues"

Let's first look at what NVIDIA did. It didn't release a single product, but a toolkit — the Agent Toolkit. It contains four things:

NemoClaw Blueprint — an open-source agent building framework. Originally, it might take enterprises weeks to build an AI agent capable of orchestrating multi-tool workflows. NVIDIA says with NemoClaw, it takes just hours.

Nemotron 3 Ultra Model — a 550-billion-parameter mixture of experts model, specifically designed for long-running AI agents. It offers 5x faster inference and 30% lower cost compared to similar models. Open-source starting June 4.

OpenShell Secure Runtime — this is one that many people underestimate but may be the most important. For AI agents to "act autonomously" in an enterprise environment, where are the security boundaries? OpenShell provides an answer: privacy policies, query anonymization, and support for local/hybrid/multi-cloud deployments. Microsoft, Canonical, Red Hat, and SAP have all integrated with it.

CUDA-X libraries become AI agent skills—cuDF for data processing, cuOpt for path optimization, AI-Q for knowledge retrieval, and PhysicsNeMo for engineering simulation. These libraries were originally designed for humans to call APIs, but now they directly serve as "skill packs" for AI agents.

Real-world scenarios have already emerged:


A signal is very clear: AI is transforming from "answering your questions" to "working for you." And not just any work — it's the kind of work at the level of designing chips, running factories, and fixing security vulnerabilities.

Huawei Cloud's Same Day: Token Industrialization

If NVIDIA's release is understood as "equipping AI agents with a complete toolbox," then what the Huawei Cloud INSPIRE conference did is something else: telling you what kind of foundation these tools need to run.

Zhou Yuefeng's original statement was "The era of Agentic AI is triggering a fundamental shift in computing paradigms" — this sentence can be translated as: "Previously, we were stacking machines to run models; now, AI agents are continuously producing and consuming tokens every second, and the scale is not on the same order of magnitude as before."

The logic behind the four new products from Huawei Cloud is very clear:

AICS Lingqu Intelligent Computing Cluster — 200 EFLOPS total computing power, 100,000-card scale, token generation latency below 10 milliseconds, 5 million tokens per second per thousand cards. This is not a lab metric; it means, "If your enterprise needs to run large-scale intelligent agents, I can provide you with stable production tokens." Online availability reaches 99.95%.

AMS Agentic Memory Storage — PB-level ultra-large memory space, KV Cache hierarchical pooling, enabling agents to run "day-level" long tasks without forgetting. This solves a problem: most current AI agents can only handle "single-session" level tasks, and once they need to run business processes lasting hours or even days, contextual memory becomes a bottleneck.

CCE VolcanoNext Unified Scheduling for General and Intelligent Computing—Unified scheduling of general computing and intelligent computing, supporting millions of concurrent operations per minute. Enterprises will not run just one Agent; when dozens or hundreds of Agents are running simultaneously, scheduling efficiency directly determines the user experience.

AgentSphere Runtime Environment — giving agents a secure "room." To some extent, this is comparable to NVIDIA's OpenShell: enterprises need to know what their AI is doing, what it can do, and what it cannot do.

Huawei Cloud's target numbers are: enterprise daily consumption of 100 trillion Tokens, a 10x reduction in Token cost per watt, millions of concurrent requests per minute, and long-term memory expanded from 128K to 100M.

I flipped through the on-site data: Huawei Cloud now has 8.5 million developers and 50,000+ partners. Yunnan Communications Investment Group is already using it—the accuracy of traffic flow prediction and congestion identification has improved by 9.91%, and over 20 intelligent agents for niche scenarios have been developed.

The work of these two companies is actually highly complementary. NVIDIA has "equipped the agent with a brain and limbs," while Huawei Cloud has "paved the roads and laid the power grid for the agent." One is working on the capability layer, the other on the infrastructure layer. But both point to the same conclusion: AI is no longer a tool; it is a colleague.

What Enterprise CIOs Really Need to Care About

Three major events have occurred within five days. For corporate digital leaders, there are several unavoidable matters:

First, the design logic of your IT infrastructure needs to change. In the past, enterprise IT followed a three-tier structure of "application-database-server." Now, a new layer called "Agent Runtime" must be added. An intelligent agent is not just a piece of code; it is a continuously running software entity with memory that can make decisions on its own. Its computing resource requirements are not based on a "request-response" model but on a "continuous dialogue + continuous reasoning" model. Your current server architecture cannot support this.

Second, security boundaries need to be redrawn. This is a problem pointed out by both BCS 2026 and NVIDIA OpenShell. When AI can autonomously operate enterprise systems—sending emails, modifying configurations, approving processes—security is no longer just about "preventing external attacks." You need to be able to see what the Agent is doing in real time, shut it down at any time, and manage permissions based on policies. The emergence of OpenShell and AgentSphere indicates that the industry is already taking this issue seriously.

Third, be prepared to accept a more fundamental change: your team members will include "digital humans." Cadence is already using AI to design chips, and NVIDIA itself is a customer. In Foxconn's factories, MoMClaw is monitoring sensor data to make decisions. This is not a distant vision; it is happening right now. Corporate organizational structures, performance evaluations, and human-machine collaboration processes—all of these need to be rethought.

Fourth, but don't forget IDC's reminder. In the same week, IDC released the report "China AI-Enhanced Enterprise ERP Market Share, 2025". Over 60% of manufacturing companies have already incorporated AI capabilities into their ERP selection criteria. However, there is a sentence in the report that many people overlook: First assess the solidity of your data foundation, then decide which module of AI-ERP to start with. After LIGAO FOODS adopted YonSuite from Yonyou, its production order efficiency increased by 81.5%, but this was based on the prerequisite that it first managed its master data clearly.

200 EFLOPS Huawei Cloud AICS total computing power 100,000 cards, maximum single cluster card count 500,000, throughput of 5 million tokens per second per thousand cards, 100 trillion tokens per day, Agent era token demand target

Writing this reminds me of a saying: In 2024, everyone was discussing "what AI can do"; in 2025, the discussion shifted to "whether AI should enter enterprises"; by June 2026, the topic had become "AI is already working autonomously within enterprises—can your infrastructure keep up?"

This change is happening much faster than most people expected.

关于我们

​我们致力于帮助中小企业实现数字化转型,我们的团队由一群充满激情和创新思维的专业人士组成,他们具备丰富的行业经验和技术专长。

扫一扫获取顾问以及手册

归档
Sign in to leave a comment
100 AI agents entered the core processes of enterprises, but Zhipu's IPO is still burning cash—this week of AI implementation, a tale of ice and fire