The Agent Protocol Quartet: MCP, A2A, AG-UI, and A2UI Are Not Competitors — They're Layers
What? I Haven't Even Learned A2A or MCP, and Now AG-UI and A2UI Are Here...
The de facto "Agent Protocol Quartet" for 2026 has already taken shape, and they are complementary layers, not competitors:
| Layer | Protocol | What It Manages |
|---|---|---|
| Tools | MCP | Agent ↔ Tools / Data |
| Agent-to-Agent | A2A (Google → Linux Foundation) | Agent ↔ Agent |
| Agent ↔ User / Frontend | AG-UI (CopilotKit, just raised $27M in May 2026; Google / Microsoft / AWS / Oracle all have public support or integration statements) | Sends the Agent's streaming events, state, and frontend tool calls to the UI |
| Rendering Payload | A2UI (Google, declarative, flat adjacency list, optimized for LLM incremental generation; "Q1 2026 React renderer" according to the roadmap) | The description format for UI generated by the Agent, running on top of A2A / AG-UI |
Don't let this table scare you.
Many people's real state right now is: they just barely know the gist of MCP, haven't looked closely at A2A, and then AG-UI and A2UI show up. So their first reaction is: Can I ever finish learning all this? Will I fall behind if I don't? Do I have to memorize all these protocols to build AI apps in the future?
My personal advice is to take a breath first.
These things aren't meant for you to integrate all at once, nor does every project need the whole suite. They represent the process of Agent applications moving from demos to real software, breaking apart several things that were previously lumped together: how tools are connected, how Agents collaborate, how the backend runtime is synchronized to the frontend, and how to describe the UI when an Agent needs to generate it.
If you distinguish these four edges, MCP, A2A, AG-UI, and A2UI become easy. If you don't, you'll think they're all fighting for the same turf; once you do, you'll see that most of the time they aren't competitors, but interface conventions at different positions.
These facts together indicate a change: Agent engineering is no longer just about "whether the model can answer," but about how each edge in the software system can be controllable, debuggable, and replaceable.
Why Ordinary People Get Anxious About These Terms
Many people's anxiety isn't because the technology itself is difficult, but because they only hear a bunch of news terms without getting a map they can put in their heads.
For example, after MCP became popular, many people just understood that "models can call tools," and then they immediately saw A2A, saying Agents also need a protocol; then they saw AG-UI, saying the frontend also needs a protocol; then they saw A2UI, saying the UI can also be generated by an Agent. If you treat all of them as "new technologies I must master immediately," of course you can't keep up.
But a course can't teach it this way.
I would have you first ask a more fundamental question: What kind of "connection cost" does this protocol actually solve?
MCP solves the cost of connecting tools. In the past, every model application had to connect databases, files, APIs, and internal systems on its own, with different connection methods, permissions, and tool descriptions. MCP abstracts this into host, client, server, and basic objects like tools, resources, and prompts. An MCP server exposes tools and context, and the MCP client in the AI application connects to it.
A2A solves the cost of Agent collaboration. An Agent doesn't necessarily have to do everything itself; it might need to hand off a task to another remote Agent. A2A has an Agent Card, which describes who this Agent is, what it can do, its endpoint, and authentication requirements; it has a Task, representing a stateful work order; and it has Message, Part, and Artifact for passing messages, files, and structured results.
AG-UI solves the cost of synchronizing the user interface. An Agent isn't a traditional interface; traditional interfaces are often "request in, response out," but an Agent thinks, calls tools, changes state, and waits for user confirmation all at the same time. AG-UI breaks these processes into events, allowing the frontend to display in real-time: it's starting to call a tool, parameters are streaming in, tool results are back, state has changed, and the user can interrupt or approve.
A2UI solves the security and maintainability cost of UI generated by Agents. Having the model directly output HTML, JSX, or JavaScript might be fast for a demo, but it's very risky for a production system. A2UI's approach is to have the Agent output declarative component descriptions. The client only allows it to use a pre-approved component catalog, and then the client's own React, Angular, Flutter, Lit, or native components render it.
See, the four protocols aren't a mess. They respectively handle tools, collaboration, interaction, and rendering.
MCP: The Agent Must First Learn to Use Tools Safely
Let's start with MCP, because most people are more likely to encounter it.
In the official MCP specification, the core structure is host, client, and server. The Host is an AI application like an IDE, chat app, or Agent platform; the client is the connector within the host responsible for connecting to a specific MCP server; the server is the end that exposes capabilities, which can be a local process or a remote service.
An MCP server mainly exposes three types of things:
- tools: Callable actions, like querying a database, calling a CRM, running a search, or executing a calculation.
- resources: Contextual resources, like files, database schemas, business configurations, or project materials.
- prompts: Reusable prompt templates, like a fixed way of asking for a certain business process.
Don't mix these three. Tools are "let the model do an action," resources are "show the model some material," and prompts are "give the model a structured way to say something." If you stuff everything into a tool, the system will be messy; if you disguise an action as a resource, it's also uncomfortable from a security perspective.
The official documentation also has a very practical security reminder: MCP tools are discoverable and automatically callable by the model, so the application should clearly inform the user which tools are exposed, tool calls should have visual cues, and sensitive operations should require human confirmation.
This is crucial for enterprises. Because once an Agent can call tools, it's no longer a "chatbot"; it can modify data, send requests, and affect business systems. The further you go, the more you can't just care about "how smart the model is"; you also need to care about tool boundaries, permissions, logging, and approvals.
So, if you're a junior or mid-level developer, don't start by memorizing the SDK when learning MCP. First, remember this judgment:
If it can be read, try to make it a resource; if it has side effects, it must be a tool with permissions and confirmation; reusable task descriptions can be solidified into prompts.
This is more valuable than copying an MCP server demo.
A2A: When the Agent No Longer Works Alone
A2A is a problem at a different layer.
Imagine an enterprise has a Sales Agent, a Legal Agent, a Finance Agent, and a Customer Service Agent. The Sales Agent receives a customer contract question. It shouldn't pretend to know legal matters; a more reasonable approach is to hand off the task related to contract terms to the Legal Agent and then bring the result back.
Several problems arise here:
How is this Legal Agent discovered? What exactly can it do? Where are requests sent? What authentication is needed? Can it stream progress back? Is its output a simple message or a document that can be saved by the system?
A2A handles these types of problems.
It has an Agent Card, like a business card, stating identity, capabilities, endpoint, and authentication requirements. It has a Task, representing a trackable work order. It has a Message, representing a single turn of conversation. It has a Part, which can carry text, file references, binary content, or structured JSON. It also has an Artifact, representing the concrete result of a task, like a report, table, file, or structured data.
The official specification also distinguishes between simple responses and complex tasks: simple interactions can directly return a Message; complex processing can return a Task and then update the status and artifact via streaming events.
The significance for enterprises is straightforward: when working across teams, platforms, and vendors, Agents can't just rely on "I'll write an HTTP endpoint for you to call." As the number of interfaces grows, discovery, authentication, state, artifacts, and versioning all become costs. A2A aims to standardize these basic actions.
But I also don't recommend that beginners immediately break all their functions into multiple Agents.
If your system is just a few modules calling each other within a single backend service, don't rush to adopt A2A. Multi-Agent sounds advanced, but it's also more troublesome to debug. A2A is more suitable for remote Agents, heterogeneous Agents, and collaboration between Agents from different organizations or vendors. If you're not at that stage yet, first do a good job with the tool boundaries and logging of your single Agent.
AG-UI: What the Frontend Needs to See is the Agent's Runtime Process
AG-UI is one of the key points of this article.
Many people, when building an Agent frontend for the first time, treat it like a regular chat: the user sends a message, the backend returns a piece of text, and the frontend renders it. At this stage, using SSE or WebSocket to stitch things together yourself can work.
But real products quickly encounter these situations:
The Agent is generating a response, and the frontend needs to display it word by word. The Agent is about to call a tool, and the frontend needs to tell the user what it's going to look up. The tool parameters are long and may not be fully generated yet, so the frontend needs to display them as they arrive. The tool result is back, and the frontend needs to attach the result to the conversation or a specific business panel. The Agent modifies a shared state, like filter criteria, form fields, or the current step. The user wants to interrupt, approve, modify, or retry.
At this point, if you continue using the approach of "the backend spits out a bit of random JSON, and the frontend does a bit of random if-else," it's fine for small projects, but as the project grows, it becomes painful. Every agent framework has different events, every tool result has a different format, and every frontend state synchronization method is different. In the end, you have a pile of glue code.
This is where AG-UI's value lies.
The AG-UI official documentation describes it as an open, lightweight, event-based protocol for standardizing the connection between an Agent and the user's frontend. The objects it cares about include messages, tool calls, state management, frontend tools, human confirmation, and interruptions.
Taking tool calls as an example, the AG-UI documentation has events like ToolCallStart, ToolCallArgs, ToolCallEnd, and ToolCallResult. A tool call doesn't suddenly produce a result out of nowhere; the frontend can see the entire process:
{ "type": "TOOL_CALL_START", "toolCallId": "t1", "toolCallName": "searchFlights" }
{ "type": "TOOL_CALL_ARGS", "toolCallId": "t1", "delta": "{\"from\":\"SHA\"" }
{ "type": "TOOL_CALL_ARGS", "toolCallId": "t1", "delta": ",\"to\":\"SFO\"}" }
{ "type": "TOOL_CALL_END", "toolCallId": "t1" }
The point of this isn't to make you memorize event names; it's to illustrate AG-UI's approach: the Agent's runtime process should be transformed into events that the frontend can understand, display, and debug.
There's another point about AG-UI that is often underestimated: frontend-defined tools.
Backend tools are easy to understand, like querying a database, calling a search, or running a calculation. But some actions should happen on the frontend, like asking the user to confirm, opening a popup, centering a map on a location, or letting the user select an option on the interface. AG-UI allows the frontend to pass these tools to the Agent at runtime, letting the Agent call frontend capabilities, but the final execution authority remains with the application.
This is crucial for enterprise products. Users won't feel safe just because the Agent says "I'll be careful"; trust comes from the interface exposing key actions: what's happening now, which actions need confirmation, and which results can be undone.
A2UI: Don't Let the Model Write UI Code Directly
A2UI solves a different problem: when an Agent needs to generate UI, what exactly should it generate?
The most obvious approach is to have the model directly generate HTML, JSX, React components, or even a piece of JavaScript. This approach is very satisfying for demos, and I understand why people like it, because the immediate effect is fast.
But, but, folks, this path needs to be very careful when entering a production system.
If you let the model directly generate executable UI code, you are effectively handing over security, styling, component permissions, performance, accessibility, cross-platform consistency, and even some business constraints to a single generation result. It might work, or it might start misbehaving under certain edge conditions. More troublesome is that once the code is output by the model, how do you audit it, reproduce it, and restrict it to only use company-approved components?
A2UI's approach is more conservative and more like an engineering system.
It has the Agent output declarative data, not arbitrary code. The client maintains a component catalog, such as Column, Text, Button, Card, Chart, Map. The Agent can only say "I want a Column with these child components; I want a Button that triggers a certain action; I want a piece of Text bound to a certain field in the data model." The actual rendering is done by the client's own component library.
The A2UI v1.0 specification includes messages like createSurface, updateComponents, and updateDataModel. A surface can be understood as a UI area; components is a list of components; dataModel is the data driving the UI.
A minimal structure might look like this:
{
"version": "v1.0",
"updateComponents": {
"surfaceId": "trip_plan",
"components": [
{ "id": "root", "component": "Column", "children": ["title", "day1"] },
{ "id": "title", "component": "Text", "text": "Three-Day, Two-Night Itinerary" },
{ "id": "day1", "component": "Text", "text": "Day 1: Hotel Check-in + Nearby Dinner" }
]
}
}
The key point isn't how pretty this JSON is, but its constraints: the components are ones you've approved, the structure is verifiable, the client can reject unknown components, unify styling into its own design system, and log every update.
A2UI also has a very interesting design: component relationships are represented using a flat list with ID references, rather than having the model generate a deeply nested tree all at once. The official documentation explains that deep nesting requires the LLM to handle brackets, hierarchy, and update points all at once, making errors costly; a flat list is easier for incremental generation and easier to update a specific component by its ID.
This is where "designed for LLM generation" comes in. It doesn't pretend that the model is better at writing UI than a frontend engineer; it acknowledges the characteristics of LLMs: generation can be streamed, partial updates are possible, but don't make it bear the responsibility of too much uncontrollable code.
How AG-UI and A2UI Work Together
Now let's look at them together.
AG-UI manages "how the Agent and the frontend communicate." A2UI manages "what structure the UI generated by the Agent has."
Let's use a corporate travel assistant as a teaching example. I'm not saying this is a project I've done; it's just to help you connect the dots.
The user says: "Help me arrange a business trip to Shenzhen next week, budget 3000, preferably no red-eye flights."
Step 1: The frontend sends the user's request to the Agent backend. This can use AG-UI's run input.
Step 2: The Agent needs to look up flights, hotels, and the company's travel policy. These tools and materials can be exposed through an MCP server: searching flights is a tool, the travel policy is a resource, and the company's common approval phrasing might be a prompt.
Step 3: If the enterprise has a dedicated Approval Agent or Finance Agent, the main Agent can use A2A to hand off the "is the budget compliant" task to it, and the other Agent returns the status and artifact using a Task.
Step 4: While the Agent is running, it sends events to the frontend via AG-UI: starting to search for flights, what the parameters are, which results were found, which plan needs user confirmation, and what the current state is.
Step 5: The Agent decides that plain text isn't enough and wants to give the user an editable itinerary card: flights, hotel, budget, approval prompts, and a replace button. This UI can be described using A2UI. The A2UI payload is then sent to the frontend via a transport method like AG-UI, and the frontend renders it using its own component library.
So, AG-UI and A2UI should not be conflated.
AG-UI is more like an event bus and interaction protocol, while A2UI is more like a UI payload format. You can use AG-UI to transmit text, state, tool events, and also declarative UI like A2UI. A2UI doesn't have to go through AG-UI either; it can run on A2A, WebSocket, REST, or related forms of MCP. Google's A2UI documentation also mentions multiple transport methods.
In a nutshell: AG-UI is responsible for "how to send, how to synchronize, how to interact," and A2UI is responsible for "how the UI being sent is described, verified, and rendered."
Why Enterprises Care About This
If you're an ordinary developer, you might ask: What does this have to do with me? I'm not a platform vendor.
It does have to do with you, but not in the sense that you need to integrate all four today.
Enterprises care about it because once an Agent enters a business system, the most expensive part is usually not the model API call cost; the cost of modification and maintenance is much more frightening.
First, write less one-time glue code.
Each team defining its own tool format, state events, UI payload, and Agent collaboration interface is the fastest in the short term but the most expensive in the long term. When people leave, no one dares to modify it; when the framework changes, everything gets rewritten. The value of a protocol is to fix common actions, allowing you to change models, frameworks, or frontends without everything shaking at once.
Second, be able to investigate problems.
When an Agent execution fails, you need to know if it was due to tool permissions, wrong parameters, a remote Agent rejecting the request, a frontend state synchronization issue, or an invalid UI payload. If everything is mixed into a piece of natural language, debugging is just guessing. After protocolization, at least every step has an object, an ID, a state, and an event.
Third, be able to control critical operations.
Enterprises won't accept an Agent arbitrarily modifying orders, sending emails, or deleting data. In MCP, tool calls need to be visible and confirmable by a human; in AG-UI, the frontend can make approval, editing, and retry first-class interactions; in A2UI, the Agent can only use approved components; in A2A, the capabilities and authentication of a remote Agent are written in the Agent Card. These are all about turning "AI can do things" into "AI does things within a controllable scope."
Fourth, product forms will change.
In the past, many enterprise systems had fixed forms, fixed lists, and fixed processes. With the introduction of Agents, the interface will become more dynamic: sometimes it's a chat, sometimes a form, sometimes an approval card, sometimes a chart, sometimes a temporarily generated operation panel. AG-UI and A2UI are concerned with this layer: the user isn't just looking at answers, but participating, confirming, modifying, and taking over during the Agent's runtime.
How Deeply Should You Learn?
Here's a tiered suggestion; don't try to swallow it all at once.
If you're a junior developer, first learn the conceptual boundaries.
You don't need to immediately write an A2A server or hand-craft an A2UI renderer. You need to be able to answer: Why is MCP not A2A? Why is AG-UI not A2UI? When should you use a tool, when should you use a resource, when is it just a frontend event, and when do you need to generate UI?
If you're a mid-level developer, learn enough to build a minimal viable prototype.
Build an Agent, use MCP to connect a real tool, like querying an internal knowledge base or calling a business API; use an event stream on the frontend to display tool call start, args, and result; then create a very small declarative UI payload, like an approval card or an itinerary card. The key isn't to pile on features; first, separate the three things: tools, events, and UI.
If you're an architect or tech lead, focus on boundaries and governance.
How are tool permissions graded? Which tools require user confirmation? How are Agent run events recorded? How is the identity of a remote Agent verified? Who maintains the A2UI component catalog? Which UI actions can have side effects? These questions are more important than "which SDK to use."
If you're non-technical but knowledgeable about AI current events, just remember one sentence:
These protocols are transforming Agents from "chatty models" into "software components that can connect to systems, collaborate, enter interfaces, and be controlled by humans."
This sentence is more useful than memorizing four acronyms.
A Conservative Implementation Order
I would personally follow this order, not chasing novelty and not pretending to have the whole suite.
Step 1: First, secure the tool layer.
If the Agent can't even safely access business systems, talking about collaboration and UI is pointless. Prioritize MCP, clearly distinguish tools, resources, and prompt templates, and do a good job with permissions, confirmation, and logging.
Step 2: Then, stabilize the frontend events.
As long as your Agent is user-facing, not a backend batch process, you will eventually have to handle streaming text, tool calls, state updates, and user confirmation. Here, you can study AG-UI. Even if you don't use it directly, learn its event decomposition approach.
Step 3: Only look at A2UI when you actually have rich interactive UI.
If your product is just Q&A and markdown, there's no need to force A2UI for the sake of trendiness. Look at it when you really need the Agent to generate forms, approval cards, editable tables, charts, or maps. A2UI's value isn't in being "new"; it's in "preventing the model from writing dangerous code directly."
Step 4: Only look at A2A when you actually have remote Agent collaboration.
If it's just a few functions calling each other within a single service, don't wrap it as multi-Agent first. Look at A2A seriously when you need cross-team, cross-cloud, cross-vendor collaboration, or when a remote Agent needs to produce a trackable artifact.
This order isn't cool, but it saves money.
Final Wrap-up
AG-UI and A2UI — you're looking in the right direction, but you need to distinguish them.
AG-UI manages the events, state, tool calls, and interaction process between the Agent and the user interface.
A2UI manages how, when an Agent generates UI, it describes the interface using a declarative, verifiable, incrementally updatable structure.
MCP manages tools and data, and A2A manages Agent-to-Agent collaboration. Together, they indicate that Agent software is beginning to become a system engineering discipline with boundaries, interfaces, and governance requirements.
Ordinary developers don't need to be anxious. First, remember the four edges, then learn according to your project stage.
If you're building a demo, MCP + simple event stream is enough.
If you're building a user-facing Agent product, look at event models like AG-UI.
If you're building Agent-generated interfaces, look at declarative UI like A2UI.
If you're building a multi-Agent platform, look at A2A.
Don't pile up protocol names just to sound knowledgeable about AI. It's okay to learn technology slowly; the fear is memorizing a bunch of terms without having a map in hand to guide your choices.
References
- MCP Official Specification: MCP is an open protocol for LLM applications to connect to external data sources and tools, using a host, client, server structure and communicating via JSON-RPC 2.0. https://modelcontextprotocol.io/specification/2025-11-25
- MCP Tools: MCP servers can expose tools that models can discover and call; for security, tool exposure and calls should be shown to the user, and confirmation should be provided for operations. https://modelcontextprotocol.io/specification/2025-06-18/server/tools
- MCP Resources: Resources are identified by URIs and used to provide context like files, database schemas, and application materials to the model. https://modelcontextprotocol.io/specification/2025-06-18/server/resources
- Linux Foundation A2A Announcement: A2A was created by Google and later moved to the Linux Foundation for governance, used for Agent-to-Agent communication and collaboration. https://www.linuxfoundation.org/press/linux-foundation-launches-the-agent2agent-protocol-project-to-enable-secure-intelligent-communication-between-ai-agents
- A2A Key Concepts: Agent Card, Task, Message, Part, and Artifact are key A2A objects; supports polling, SSE streaming, and push notifications. https://a2a-protocol.org/latest/topics/key-concepts/
- A2A Specification: Complex tasks can return a Task and stream status or artifact updates; simple interactions can directly return a Message. https://a2a-protocol.org/latest/specification/
- AG-UI Official Documentation: AG-UI is an open, lightweight, event-based protocol for connecting user frontends and Agent backends. https://docs.ag-ui.com/introduction
- AG-UI Events: Tool calls use events like ToolCallStart, ToolCallArgs, ToolCallEnd, and ToolCallResult to express the process. https://docs.ag-ui.com/concepts/events
- AG-UI Tools: AG-UI distinguishes between backend-defined tools and frontend-provided runtime tools; frontend tools can be used for confirmation, UI actions, and user engagement flows. https://docs.ag-ui.com/concepts/tools
- CopilotKit Series A: CopilotKit announced a $27M Series A on 2026-05-05, stating that AG-UI has been adopted by Google, Microsoft, Amazon, and Oracle. This is CopilotKit's official statement, treated as a market signal in the article. https://www.copilotkit.ai/blog/series-a
- A2UI Official Homepage: A2UI lets Agents generate rich interactive UI, rendering natively across web, mobile, and desktop without executing arbitrary code; v0.9.1 is the current production release, v1.0 is a candidate. https://a2ui.org/
- A2UI Components: A2UI uses a flat component list with ID references to express component hierarchy, facilitating LLM incremental generation and updates by ID. https://a2ui.org/concepts/components/
- A2UI v1.0 Specification:
updateComponentssends a component list,updateDataModelupdates the data model, and the client can render progressively. https://a2ui.org/specification/v1.0-a2ui/ - Google Developers Blog A2UI v0.9: A2UI can be used via transport methods like MCP, WebSockets, REST, AG-UI, and A2A, and supports incremental parsing and repair of LLM output. https://developers.googleblog.com/a2ui-v0-9-generative-ui/
This article was translated from the original Chinese by Guibai.