Product - Dominic Plouffe (CTO)

Why the next decade of agents will be decided by their contracts, not their brains

There’s a familiar pattern I keep seeing whenever a hot new agent platform shows up: breathless demos of planning and autonomy, a bunch of infrastructure scaffolding, and then—inevitably—confusion the first time that agent needs to call a real tool in a real workplace.

Two things landed in the last 48 hours that make this obvious in a useful way. One is chatter about a big vendor shipping an open agent platform. The other is a clear, practical writeup of the ReAct/tool-calling loop where you explicitly model state, tool schemas, and transitions. Together they highlight a simple truth: agents aren’t just models + compute. They are contracts between a thinking thing and the systems it touches.

What I mean by “contracts”

By contract I mean the agreed shape of interaction—the inputs, the outputs, the error modes, and who is responsible for recovery. Contracts sit between three actors: the LLM (the reasoner), the tool set (APIs, DBs, UIs), and the business that owns the outcome. A good contract makes the interaction predictable. A bad one is where subtle failures hide until they become disasters.

Think of it like a marketplace listing. A great item description tells buyers precisely what they’ll get, what’s excluded, and what happens if something’s damaged in shipping. Tools need the same thing when agents use them: clear schemas, explicit side effects, and well-defined failure semantics.

Why this matters more than model size right now

Everyone wants to argue about parameter counts, token limits, or who trained what on which dataset. Those debates matter for capabilities, but not for production reliability. In practice, the majority of outages, hallucinations, and compliance incidents I see happen at the boundary—when an agent takes an action that touches people, money, or private data.

Here’s the mental model: the LLM is the planner, but the world is deterministic only if you make it so. The agent’s brain can generate a plan, but unless the tool contract guarantees idempotency, transactional boundaries, and clear error codes, the plan will meet chaos. That’s not an ML problem; it’s a systems design problem.

An everyday example

Imagine a marketing agent that updates bids in a PPC campaign. The agent decides to raise bids on a promoted SKU because conversion metrics looked good. If the API call is retried without idempotency, your bids could double or worse. If the tool returns vague success messages, the agent may assume the change applied when it didn’t. That’s a measurable revenue leak you’ll notice on Monday morning.

Three contract-level guarantees you should design for

When you build agent-enabled systems, prioritize these guarantees before you tune models:

Idempotency: Every state-changing call should be safe to retry. If a request can’t be retried, make the contract explicit and force human confirmation.
Observability: Tools must emit machine-readable events for every action and every failure. The agent sees events; humans can trace them; alerting works.
Authority & scope: Each agent action must be scoped to an account/role and limited in blast radius. Prefer explicit capability tokens over vague “write” permissions.

Where ReAct-style graphs help

The recent practical guides on ReAct-style loops show that if you treat the agent’s reasoning loop as explicit state transitions, you get two big wins:

You can instrument and replay the loop. When something goes wrong you can reconstruct the exact decisions and tool outputs that led there.
You can encode stop conditions and human-handoff points. Instead of a monolithic “do it all” agent, you get a graph that can pause, ask, or escalate based on variables you control.

That’s operational gold. When a business runs thousands of agent actions per day, being able to replay a single mistaken sequence until you understand the failure is what turns a reactive firefight into a continuous-improvement cycle.

Design patterns that reduce risk

I use a few patterns repeatedly when I’m driving product decisions for agent features:

Shadow mode first: Let the agent propose actions and write them to a queue or audit log instead of taking them. Let humans confirm or run a verification pass that replays the tools in sandboxed mode.
Progressive capability rollout: Start with read-only and scheduled writes, then add real-time write capabilities after you’ve observed behavior in production for a while.
Explicit compensation paths: Every destructive action needs a defined undo or compensation workflow. Build the undo API before you let agents touch the live system.

Why open agent platforms raise the stakes

Open agent frameworks and vendor platforms both make it easier to stitch together LLMs and tools. That’s great for innovation, but it increases the surface area for misunderstandings. An open platform with lots of connectors makes it easy to accidentally expose a tool without the right contract guarantees.

Platforms will succeed when they treat connectors as first-class citizens: packaged with schemas, test harnesses, and safety gates. The platform’s job is not just to let you wire a model to an API; it is to help you ship a predictable contract that survives scale.

Product implications

For product folks, the practical question is: what do you ship first? My bias: ship the guardrails before the autonomy. Customers will forgive an agent that’s slow or conservative if it doesn’t break things. They will not forgive silent data leakage or thundering financial changes.

So make autonomy a premium feature, not the default. Build visibility, role-based control, and sandboxing into the product experience. Then sell the autonomy story with a track record: “we ran this in shadow for 30 days and reduced handle time by 24% without any live write errors.” That is believably valuable.

Short checklist for launch

Define the API contract (inputs, outputs, error codes).
Implement idempotency and audit events on every write path.
Run shadow-mode validation and collect replayable traces.
Roll out capabilities progressively with human-in-the-loop gates.
Document compensation workflows and test them under load.

The long game

In the long run we’ll get better models, and those models will make more credible plans. That’s exciting. But the thing that separates an agent demo from a sustainable product is not how clever the planner is—it’s whether the world it touches behaves in predictable, testable ways.

If you’re building agent features this year, treat the tool boundary like a core product surface. Ship contracts, not conveniences. Build the undo before you build the action. And if you want a quick win, instrument the loop so you can replay, debug, and iterate without a blame game.

One retail/personalization analogy

In retail analytics, a bad data contract is like asking for nightly sales numbers but getting different definitions of “sale” from each store. Decisions computed on that data are brittle. Agents face the same trap: if each connector reports success differently, your agent’s decisions will be brittle—and customers will notice where it hurts their margins.

What to do tomorrow

Pick one high-impact agent action in your product and apply the checklist above. Run it in shadow for two weeks, capture traces, and see how often your contract ambiguity shows up. Fix those gaps before you turn the knob to full autonomy.

That is boring work. It’s also what buys you a future where agents are a feature users trust instead of a liability they tolerate.

Category: Product

Agents Need Contracts, Not More Brains