Why Agent‑to‑Agent Communication Should Be Wrapped in CloudEvents
Or: how to stop your multi‑agent system from becoming distributed shell scripts.
Most agent‑to‑agent (A2A) communication today looks deceptively simple.
One agent emits some JSON.
Another agent reads it.
Maybe there’s an HTTP call. Maybe a message queue. Maybe a shared memory buffer.
It works in demos. It even works in early production.
And then one day you ask a very reasonable question:
“Why did this agent do that?”
At that moment, most agent systems collapse under inspection.
This article makes a straightforward claim:
If agents are autonomous actors, then their communication must be treated as events of record.
Wrapping A2A protocols in CloudEvents is how you get there.
This isn’t about adding ceremony.
It’s about making agentic systems operable.
A2A Is Not a Chat Problem. It’s a Distributed Systems Problem.
There’s a persistent category error in how we design agent systems.
We think of A2A communication as “chat between models.”
In reality, it has far more in common with microservices talking to each other under partial failure.
Agents are:
- autonomous
- stateful
- non‑deterministic internally
- long‑lived in intent, even if their processes are ephemeral
Once you have more than one agent, you inherit all the classic distributed systems problems:
- causality
- retries
- duplication
- partial failure
- ordering
- observability
- auditability
Most A2A protocols today ignore this.
They optimize for immediacy, not history.
And history is where systems break.
The Failure Mode of “Raw” A2A Protocols
A typical A2A message might look like this:
{
"from": "planner-agent",
"to": "executor-agent",
"task": "deploy_service",
"params": {
"service": "billing-api",
"env": "prod"
}
}Seems fine. Until you ask:
- Was this retried?
- Was it acted on twice?
- What caused it?
- What version of the planner sent it?
- What happened after it was sent?
- Can I replay this interaction in a test environment?
There is no answer, because there is no event of record.
This is the same mistake we made in early microservices: RPC everywhere, state nowhere, history lost.
Agent systems are now repeating that mistake—faster.
Agents Aren’t Features. They’re Actors.
Once you accept that agents are actors, a few things become obvious:
- Their actions have consequences
- Those consequences must be observable
- Their decisions must be inspectable after the fact
- Their interactions must survive time
An agent that emits an instruction is not “sending a message.”
It is causing an event in the world.
And events deserve structure.
CloudEvents: The Boring Thing That Fixes This
CloudEvents is not a message bus.
It’s not Kafka. It’s not NATS. It’s not SQS.
It’s a minimal, transport‑agnostic event envelope with a deliberately small contract:
id— unique identitysource— where the event came fromtype— what kind of event this issubject— what the event is abouttime— when it happeneddata— the payloaddatacontenttype— how to interpret it
That’s it.
Which is exactly why it works.
CloudEvents doesn’t tell you how to move events.
It tells you what an event is.
And that distinction matters enormously for agents.
Why CloudEvents Fits A2A Communication Almost Too Well
Let’s map real A2A requirements to CloudEvents primitives:
| A2A Requirement | CloudEvents Provides | |-----------------|----------------------| | Agent identity | source, subject | | Intent | type | | Causality | id, correlation extensions | | Replayability | Stable, immutable envelopes | | Transport flexibility | HTTP, Kafka, NATS, SQS, files | | Governance & audit | Metadata + schema |
A key insight is simple but powerful:
Agents reason over time. CloudEvents preserves time.
Without time, causality collapses.
Without causality, debugging becomes mythology.
Raw A2A vs CloudEvents‑Wrapped A2A
Naive A2A Message
{
"action": "summarize_document",
"document_id": "doc-417",
"requested_by": "research-agent"
}CloudEvents‑Wrapped A2A Event
{
"specversion": "1.0",
"type": "agent.task.requested",
"source": "agent://research-agent",
"id": "9c2f1c4e-2c5d-4c4d-9c4a-1d92cbd3b8fa",
"time": "2026-03-26T09:12:41Z",
"subject": "document/doc-417",
"datacontenttype": "application/json",
"data": {
"task": "summarize_document",
"constraints": {
"length": "short",
"audience": "exec"
}
}
}Nothing magical happened.
But suddenly you can:
- replay this event
- trace what caused it
- audit who asked for it
- correlate it with downstream actions
- simulate the system offline
- enforce policy at the boundary
That’s the difference between a demo and a system.
“Isn’t This Overkill?”
This objection always appears. It appeared with:
- structured logging
- distributed tracing
- event sourcing
- schema registries
Let’s address it directly.
“It adds latency”
Your LLM call takes hundreds of milliseconds.
A CloudEvents envelope adds microseconds.
Latency is not your bottleneck. Opacity is.
“Agents are ephemeral”
Exactly.
Which is why their effects must not be.
Processes die. Events persist.
“This is enterprise architecture creeping in”
No.
This is the minimum architecture required for systems that evolve.
The cost of not having an event of record is always paid later—
during incidents, audits, rewrites, and blame.
With interest.
A2A + CloudEvents Is How Agents Scale Beyond a Single Team
Once A2A communication is event‑native, new doors open:
- Multi‑team agent ecosystems
- Cross‑org agent collaboration
- Safety and policy enforcement
- Deterministic replay and simulation
- Compliance without rewriting everything
- Tooling reuse across agent frameworks
This mirrors exactly what happened with microservices:
The systems that survived were the ones that treated events as first‑class.
Agents are not exempt from this law.
If anything, they need it more.
Don’t Let Agents Become the New Shell Scripts
Shell scripts worked—until they didn’t.
They were powerful, flexible, and completely ungovernable.
Agent systems built without event discipline are headed the same way.
If you expect your agents to:
- operate autonomously
- coordinate reliably
- survive production
- be understood six months later
Then agent‑to‑agent communication must be event‑native.
Wrapping A2A protocols in CloudEvents isn’t abstraction for its own sake.
It’s how agent systems grow up.
If you’re building agent infrastructure today, treat this as a design constraint, not an optimization. Your future self will thank you.