Tags:#ai_and_agents #software_engineering

Agentic Systems Don’t Have a Workflow Problem. They Have a Distributed Systems Problem.

Much of today’s thinking about agentic systems—particularly in the context of autonomous enterprise backend platforms—starts from the wrong abstraction.

We start with workflows.

We talk about agents as if they are steps in a flowchart.
We design orchestration graphs.
We invent DSLs that look suspiciously like BPMN with an LLM bolted on.

And then we are surprised when these systems collapse under scale, complexity, or audit requirements.

This is a category error.

Agentic systems are not primarily a workflow problem. They are a distributed systems problem. Until we treat them as such, enterprise‑grade autonomy will remain fragile, opaque, and fundamentally unscalable.

The Workflow Trap

Workflow‑centric thinking assumes three things:

A predefined execution path
A central orchestrator
Deterministic execution semantics

These assumptions work tolerably well for batch jobs, ETL pipelines, and CRUD‑heavy microservice choreography.

They fail the moment you introduce agents.

Why?

Because agents are not steps. They are actors.

Agents are:

Autonomous
Stateful
Internally non‑deterministic
Long‑lived in intent, even if their processes are ephemeral

The moment you have more than one agent, you no longer have a workflow.

You have a distributed system.

The Persistent Category Error in Agent Design

There is a recurring mistake in how agent systems are designed and discussed:

We think of agent‑to‑agent (A2A) communication as “chat between models.”

This framing is intuitive—and deeply misleading.

In reality, A2A communication has far more in common with microservices talking to each other under partial failure than with conversational interfaces.

As soon as agents interact, you inherit all the classic distributed systems problems:

Causality
Retries
Duplication
Partial failure
Ordering
Observability
Auditability

Most A2A protocols today ignore this.

They optimize for immediacy, not history.

And history is where systems break.

Why Predefined Orchestration Fails

Traditional orchestration assumes you can specify the control plane up front.

Agents violate this assumption by design.

They may discover new sub‑tasks dynamically.
They may defer action based on evolving context.
They may operate concurrently, competitively, or redundantly.
They may revise earlier conclusions.

Trying to force this behaviour into a static workflow graph is like trying to model the internet as a single call stack.

The result is brittle systems that work in demos and fail in production.

The alternative is not better workflows.

The alternative is emergent orchestration.

From Workflows to Swarms

If agentic systems are a distributed systems problem, the natural architectural endpoint is not orchestration. It is swarm computing.

A swarm‑based agentic platform has three defining properties.

1. Mixed Determinism

Not everything should be an agent.

Most things shouldn’t be.

A scalable agentic system deliberately combines:

Deterministic microservices
For validation, enforcement, enrichment, persistence, and guarantees.
Non‑deterministic agents
For reasoning, planning, interpretation, and decision‑making under uncertainty.

Agents think.
Microservices execute.

The capability of the system emerges from their interaction, not from pretending everything is an agent.

2. Event‑First Communication

In a swarm, actors do not call each other directly.

They emit events.
They consume events.

They react to what they observe, not to who sent it.

Communication happens through events interlinked by correlation identifiers, not through predefined call graphs or orchestration plans.

There is:

No central orchestrator
No global workflow definition
No single source of truth for control flow

Instead, orchestration emerges from the event topology itself.

What happens next is defined by which actors are listening, what state they carry, and how they interpret the event history so far.

This is not chaos.

This is how distributed systems actually work.

3. History as a First‑Class Primitive

Swarm systems treat history as infrastructure, not metadata.

Every meaningful action produces an event of record.
Every decision can be reconstructed from prior state.
Every agent’s behavior is explainable after the fact.

This requires:

Durable event logs (for example, Kafka)
Strong correlation semantics
Explicit causality tracking
Reproducible state transitions

Without this, autonomy becomes an excuse for opacity.

Why Most Agent Platforms Break at Enterprise Scale

Most current agent frameworks were designed for:

Single‑process execution
Short‑lived interactions
Human‑in‑the‑loop supervision
Minimal compliance and audit requirements

Enterprises need the opposite:

Long‑running autonomous behavior
Partial failure tolerance
Auditability by default
Deterministic replay
Change control via GitOps
Clear operational ownership

These properties cannot be bolted on later.

They must be native to the platform.

The Case for a Swarm Computing Platform

This is why we argue for a swarm computing platform for autonomous enterprise backends.

At a high level:

Agents and microservices are first‑class actors
Actors are dynamically launched and executed in containers
All communication happens via CloudEvents over Kafka
Correlation, not orchestration, defines system behavior

The platform provides the harness that individual actors should not have to reinvent:

Auditability
Logging
State management
Context transfer
Retry semantics
Observability
GitOps‑based deployment and evolution

Agents get autonomy without being allowed to bypass operational reality.

Emergence Is Not Optional

If your agentic system does not allow for emergent behavior, it is not truly agentic.

And if it allows emergence without the infrastructure to observe, constrain, and explain it, it is not enterprise‑ready.

Swarm architectures accept emergence as inevitable—and then design for it.

A Final Reframe

Stop asking:

“How do we design better workflows for agents?”

Start asking:

“What does a distributed system look like when some of its nodes can think?”

The answer is not orchestration.

The answer is a swarm.