Tags:#ai_and_agents #software_engineering

Thought Primitives: An Architecture of Durable Reasoning Systems

Most of the conversation around AI systems today revolves around three ideas: context, memory, and autonomy. How much context can a model hold? How should it remember? How should agents share state across long-running tasks? These are important questions, but they are increasingly being asked as if there ought to be a single general-purpose answer.

There probably should not be.

For broad consumer use, generic memory and generic context windows are often good enough. But for high-value, domain-specific work, the problem is different. When an AI system is involved in engineering software, designing infrastructure, coordinating operations, validating compliance, or navigating months-long chains of dependent decisions, what matters is not simply whether the model can continue generating. What matters is whether the process can be structured, reviewed, replayed, audited, interrupted, resumed, challenged, and improved.

That shift in emphasis is more important than it appears. It suggests that the future of serious AI systems will not be defined by ever longer opaque runs of generation. It will be defined by the creation of explicit, materialised graphs that capture decomposition itself: how a requirement was broken down, which branches were explored, which constraints were introduced, which validations were applied, and how the eventual outcome emerged from many smaller decisions.

In other words, instead of treating planning as a transient prelude to execution, we should treat planning as an artefact. And once planning becomes an artefact, a graph becomes the natural form.

The problem with long-running generative execution

Many agentic systems today are effectively black-box loops. A user provides a prompt, the system creates a plan internally, spawns subtasks, calls tools, backtracks, retries, edits files, and eventually produces an output. This can be impressive, and sometimes genuinely useful. But it also reveals a serious limitation in the current generation of tools: most of the value produced during the run is discarded.

The output is kept. The process is lost.

This is acceptable for trivial tasks. It is not acceptable for complex ones.

If an AI system generates a toy application in twenty minutes, perhaps we can tolerate limited visibility into how it got there. But if the same paradigm is applied to the development of a banking workflow, a supply chain orchestration engine, a clinical operations process, or a large enterprise codebase, then the absence of a durable planning structure becomes a structural weakness.

Without explicit intermediate structure, you lose at least six things:

Observability: You cannot properly inspect how the system reasoned about the work.
Auditability: You cannot verify which assumptions or constraints shaped the result.
Replayability: You cannot restart from meaningful checkpoints without rerunning everything.
Prioritisation: You cannot intervene in the task hierarchy before execution proceeds.
Coordination: Multiple humans and agents cannot operate cleanly over the same work structure.
Learning: The system has very little reusable process memory beyond prompts, logs, and outputs.

This is not just a UX issue. It is an architectural issue.

A surprising amount of today’s agent tooling still behaves as if the important thing is to let the model run farther. But in complex work, the challenge is rarely just generation depth. The challenge is controlled decomposition under constraints.

What data engineering already understood

There is a useful parallel here from data engineering, especially in fintech, banking, and other high-integrity systems.

In modern data platforms, we often celebrate continuous processing, streaming, low-latency enrichment, and end-to-end automation. But the most robust systems also know when to stop. They know when to materialise state. They know when to let data come to rest.

Why? Because some tasks are only possible, or only trustworthy, when intermediate results become explicit artifacts. Validation, reconciliation, auditing, backfills, root-cause analysis, exception handling, regulatory review, and downstream certification all depend on this. Even when a pipeline is theoretically replayable from the beginning, teams still introduce specific points of persistence and explicit state transition because these checkpoints make operations comprehensible.

The medallion model is one example of this mentality. Raw data is not treated as equivalent to validated data, and validated data is not treated as equivalent to business-ready data. State is staged. Meaning is progressively added. Intermediates matter.

This way of thinking has not been properly imported into AI building systems.

We are still too willing to let agentic systems flow uninterrupted from prompt to outcome, as if continuity alone were evidence of capability. But continuity is not enough. In high-stakes systems, there is enormous value in engineered pauses, explicit representations, and durable transition points. Those are not bureaucratic overhead. They are the infrastructure of trust.

From token flow to artefact flow

The dominant interaction pattern in agentic AI today is token flow: prompt in, chain of generation, output out.

What I am arguing for is artefact flow.

In an artefact-flow system, the model does not merely continue generating. It progressively externalises structure. Requirements become task graphs. Task graphs become reviewed plans. Reviewed plans become execution graphs. Execution graphs emit outputs, validations, defects, decisions, and revisions. Each layer can be stored, traversed, queried, compared, and reused.

This changes the nature of the AI system.

The system is no longer merely a machine for producing end artefacts such as code, reports, or designs. It becomes a machine for producing structured intermediate representations that other humans and other agents can act upon.

That is a far more powerful idea.

A plan that can be reviewed before execution is more valuable than a plan that only existed inside a model’s hidden chain of thought. A hierarchy of subtasks with dependencies is more valuable than a monolithic execution trace. A stored decomposition of a requirement is more valuable than an impressive but irreproducible sprint of generation.

This is why the task graph matters. It materialises the decomposition process itself.

The graph is not a visualisation layer. It is the product.

Most systems treat graphs as downstream conveniences: a way to visualise work after the fact, perhaps on a dashboard or in a planning tool. That framing is too weak.

For complex AI building, the graph should not be an afterthought. It should be a primary artefact of the system.

Consider what happens when a requirement is turned into a graph rather than a transient prompt expansion. You can now:

represent parent-child decomposition explicitly
track dependencies between tasks
attach assumptions, evidence, constraints, and owners to nodes
branch alternative strategies without losing lineage
mark validation gates before implementation
prioritise implementation paths
assign subtasks to specialised agents or humans
preserve rejected options instead of discarding them
support partial replay from selected nodes
compare multiple decompositions over time
accumulate reusable structures across projects

At that point, the graph is doing more than task management. It is becoming a system of record for thoughtful work.

This is the crucial move. Instead of asking an agent to “go build the thing,” we ask the system first to produce a legible, manipulable structure of the thing-to-be-built. The graph is the durable memory of decomposition.

That memory is useful before execution, during execution, and after execution.

Why decomposition itself is valuable

There is a tendency in software circles to treat decomposition merely as a means to an end. Break the problem into parts so it can be built faster. Once the work is done, the decomposition can be forgotten.

But this view underestimates the value embedded in how a system was broken down.

The decomposition of work tells you what the system believed mattered. It reveals hidden architecture, implicit constraints, anticipated failure modes, organisational priorities, and assumptions about sequencing. It is a representation of intent.

For a complex product, that representation often matters almost as much as the final implementation.

Ask any experienced engineer why two systems with similar outputs can feel radically different to maintain. The answer is rarely just code quality. It is usually that one system has legible internal structure and the other does not. The same applies to AI-driven work. If the decomposition is explicit, you can inspect not just what the system built, but what conceptual model it used while building it.

That becomes even more important when systems are re-engineered. The task graph from version one is not obsolete once version one ships. It becomes a strategic artefact for redesign, migration, governance, and future automation.

This is why the comparison to JIRA is useful but incomplete. JIRA gives us issue hierarchies, status, assignments, and workflow. But what I am describing is not just ticketing. It is generative decomposition into a persistent graph of intent, constraints, alternatives, and execution readiness.

A JIRA tree is a management artieact. A materialised task graph can be an epistemic artefact.

A coding machine should build plans before it builds code

The current generation of AI coding tools has shown that it is possible to get surprisingly far with weak planning. A single prompt can trigger a cascade of edits that produce an application, a prototype, or even a large experimental codebase.

But this has created a misleading impression: that planning is optional because generation is getting stronger.

For serious systems, the opposite is true. As generation becomes cheaper and faster, planning artefacts become more important, not less. If execution is abundant, then the scarcest resource shifts to judgment: what should be built, in what order, under what constraints, with what interfaces, and with what verification points?

A genuine coding machine should therefore not begin by writing code. It should begin by building and refining a graph.

A requirement enters the system. The system decomposes it into major components. Each component is recursively expanded into subproblems. Dependencies, unknowns, interfaces, validation criteria, and risk tags are attached. Alternative decompositions may coexist. Humans review. Other agents challenge assumptions. Cost or token budgets can be allocated at the node level. Only then does execution begin.

Once you have this, several powerful things become possible:

You can stop execution before expensive branches are pursued.
You can inspect whether the decomposition is sensible before code exists.
You can reassign parts of the graph to different agents with specialised capabilities.
You can replay from a failed node without restarting the whole system.
You can audit why one implementation path was chosen over another.
You can reuse graph fragments from prior projects as domain templates.

This is much closer to how large-scale engineering actually works.

No serious team builds a skyscraper by pressing “generate building.” They create layered plans, approved work packages, dependencies, validation gates, and change records. Software and agentic systems have been allowed to behave more casually largely because the artefacts are digital and generation is cheap. But the underlying coordination problem has not disappeared.

Beyond software: plans as industrial artefacts

Once decomposition is treated as a graph artefact, the idea generalises immediately beyond code.

Buildings can be decomposed into structural, mechanical, electrical, regulatory, procurement, and sequencing graphs. Factories can be represented through process flow, asset dependency, maintenance, staffing, and safety graphs. Logistics networks can be understood through route graphs, capacity graphs, disruption graphs, and policy graphs. Military systems, spacecraft, legal workflows, energy grids, pharmaceutical pipelines, and complex financial products all exhibit the same pattern: they are too complex to be managed well through linear documents alone.

What ties these domains together is not that they all need “AI agents.” It is that they all require durable representations of interdependence.

The graph is the natural substrate for this because it captures more than hierarchy. It captures relation.

Hierarchy tells you that task B belongs under project A. Graph structure tells you that task B depends on design review C, introduces regulatory risk D, affects component E, and must be validated against dataset F. In real systems, these relational structures matter more than checklist order.

This is where the argument becomes broader than AI planning. Materialised graphs are not just better planning tools for models. They are better representations for complex coordinated reality.

From task graphs to multi-dimensional graphs

At this point, the concept needs to expand.

A task graph is only the first useful layer.

Most complex systems are not adequately represented by one graph alone. They are better understood as stacks of interacting graphs, or what we might call multi-dimensional graphs.

Take software as an example.

You can represent:

the code graph: modules, functions, services, interfaces, dependencies
the task graph: features, subtasks, milestones, planned changes
the bug graph: defects, root causes, impact surfaces, recurrence patterns
the runtime graph: services, traces, latencies, bottlenecks, failures
the team graph: ownership, expertise, review routes, escalation paths
the requirement graph: business goals, policies, customer promises, constraints
the security graph: permissions, threat surfaces, controls, exceptions
the data graph: schemas, lineage, transformations, quality signals

Each of these is a legitimate graph in its own right. But the real power comes from linking them.

A bug node can attach to code nodes, runtime nodes, and task nodes. A regulatory requirement can constrain tasks, data flows, and interface design. A performance incident can trigger reprioritisation in the task graph and expose a flaw in the architecture graph. A human reviewer can be connected not just to ownership, but to prior decisions, known expertise, and past failure patterns.

In other words, the graph of the system is not singular. It is layered.

This is what makes the term multi-dimensional useful. We are not merely drawing richer diagrams. We are constructing a substrate in which different representations of the same system can interact while preserving their own semantics.

That matters for AI because it enables more grounded intervention. An agent no longer acts over a vague textual brief. It acts over typed structure.

Why this matters for agent design

One reason many agent systems remain brittle is that they operate over under-structured environments. They are given broad objectives, a few tools, a scratchpad, and a memory store, then expected to coordinate across ambiguity.

This works poorly at scale because the agent is forced to recreate structure on the fly during every run.

Materialised graphs change that. They move structure out of the agent’s temporary context and into persistent shared infrastructure.

This has several consequences.

First, it reduces the cognitive burden on the model. The agent does not need to keep the entire decomposition active in its working context because the graph externalises it.

Second, it enables specialised subagents. A validation agent, security agent, planning agent, estimation agent, or implementation agent can operate over the same shared graph, each updating different node types or edge types.

Third, it makes intervention composable. Humans can review only the regulatory branch, or the cost branch, or the critical path, without needing to inspect the whole project linearly.

Fourth, it enables deterministic control surfaces around probabilistic generation. You can allow models to be creative within bounded regions of the graph while enforcing hard constraints at the graph level.

Fifth, it supports institutional memory. Not memory as a bag of retrieved text snippets, but memory as accumulated structure: prior decompositions, accepted patterns, common failure branches, reusable node templates, and validated graph fragments.

This is a profoundly different model from today’s prompt-heavy agent orchestration.

Thought primitives

This is where I think the idea becomes most interesting.

If materialised graphs become the operating substrate of complex AI systems, then over time certain graph structures will recur. Certain decomposition patterns will repeatedly prove useful in banking, compliance, industrial design, software delivery, cybersecurity, healthcare operations, logistics, and other domains.

These recurring graph structures are what I call thought primitives.

A thought primitive is not just a prompt template. It is not simply an ontology. It is not merely a workflow. It is a reusable structural pattern for decomposing and reasoning about a class of problems.

A thought primitive may specify:

common node types for a domain
common edge types and dependencies
standard validation checkpoints
typical failure branches
escalation patterns
replay boundaries
interfaces between human and machine review
canonical decomposition depths
mappings between planning artefacts and execution artifacts

In software, a thought primitive might encode how to decompose a new product feature into interface design, data model changes, service contracts, test plans, rollout strategy, observability, and security review.

In banking, a thought primitive might encode the graph needed to assess a new regulated process: source data, transformations, approval points, audit evidence, control ownership, exception pathways, and reporting outputs.

In construction, a thought primitive might encode the recurring relationship between design packages, compliance approvals, procurement dependencies, and sequencing constraints.

The value here is enormous. Once such primitives exist, organisations no longer start from scratch every time they face a complex planning problem. They begin from structured patterns that can be adapted, validated, and extended.

That is how AI building becomes less like improvised prompting and more like engineering.

The future platform is not just a model host

If this thesis is correct, then the next important AI platforms will not merely host models, run tools, and store chats. They will manage graph-native work artefacts.

They will let users:

generate task graphs from requirements
recursively refine those graphs
attach evidence, constraints, and policy
review branches before execution
assign nodes to humans or specialised agents
track lineage across decompositions
preserve rejected branches
replay from chosen checkpoints
link planning graphs to execution, runtime, compliance, and defect graphs
mine recurring graph shapes into reusable primitives

This is a much more serious vision of AI infrastructure than today’s chat-first interface patterns.

The core UI of such systems may not be a chat box at all. It may be a navigable graph workspace with selective conversational overlays. Chat becomes one interaction mode among many, not the primary container of intelligence.

That would be a meaningful step forward, because chat is a poor long-term container for complex coordinated work. Graphs, by contrast, are inspectable, branchable, and persistent.

Why this approach is more aligned with trust and control

There is also a governance argument here.

As AI systems take on more operational responsibility, institutions will not accept systems that cannot explain how work was decomposed, what validations were inserted, where human review occurred, and which branches were abandoned. Black-box autonomy may be tolerable for low-stakes convenience tasks. It is not viable for regulated, safety-critical, or strategically important systems.

Materialised graph structures provide a better foundation for trust because they expose the architecture of the work itself. They create a boundary object that both humans and machines can inspect.

This is especially important because the real danger in complex automation is not only wrong outputs. It is invisible wrong process.

A system that produces the correct answer for the wrong reasons is hard to trust. A system that exposes its planning graph, validation graph, and decision lineage is easier to challenge, govern, and improve.

In that sense, graph materialisation is not just a productivity pattern. It is a control pattern.

A different definition of memory

Much of the current discourse around agent memory assumes that the central challenge is how to help models remember more text across longer horizons. But perhaps memory in serious systems should be defined differently.

Perhaps memory is not primarily the persistence of context tokens.

Perhaps memory is the persistence of structured artefacts.

A stored task graph is memory. A linked runtime incident graph is memory. A decomposition template that worked in prior projects is memory. A record of which validation gates caught errors is memory. A preserved branch showing why an alternative plan was rejected is memory.

This kind of memory is more expensive to design than a vector store or a larger context window. But it is also more useful, because it supports action, coordination, and explanation.

It is closer to institutional memory than conversational recall.

The deeper implication

The deeper implication is that AI systems may become most valuable not when they directly generate final artefacts, but when they help materialise the intermediate representations from which many artifacts can later be produced.

That is a subtle but important inversion.

Instead of asking AI to directly write the code, we may ask it to first produce the graph from which code, tests, plans, tickets, documentation, validations, and review workflows can all be derived.

Instead of asking AI to directly plan a factory, we may ask it to first produce the interlocking graphs through which procurement, sequencing, compliance, maintenance, and simulation can be coordinated.

Instead of asking AI to directly execute a long and opaque chain of work, we may ask it to build the durable substrate on which many more constrained and intelligible executions can run.

That substrate is what turns generation into engineering.

Conclusion

The next leap in AI systems will not come only from larger context windows, longer autonomous loops, or more aggressive execution. It will come from better externalisation of structure.

For simple tasks, direct generation will remain useful. But for complex, high-value work, the future belongs to systems that can materialise decomposition, preserve decision structure, and coordinate many forms of intervention over shared artefacts.

The task graph is the beginning of that future, not the end of it.

Once you accept that decomposition should be persistent, the graph starts to spread. It extends from tasks to code, from code to bugs, from bugs to runtime, from runtime to policy, from policy to teams, from teams to operations. The system stops looking like a workflow and starts looking like a multi-dimensional graph of reality.

And from those repeated structures emerge reusable patterns for thought itself.

That is what thought primitives are: not prompts, not just ontologies, and not merely workflows, but materialised structures for reasoning about complex domains in ways that humans and machines can share.

If we build these systems well, the most important artefact an AI produces may not be the final answer.

It may be the graph that made the answer possible.