Dominic Plouffe (CTO)

Big data + agents. Less hype, more systems.

Blog

  • Packaging Beats Peak Performance

    Packaging beats peak performance: why open-source models stall at the doorstep

    There’s a pattern I keep seeing: an engineering team furiously trains a model, they throw the weights up on a public registry, and then they wait. Silence. Not because the model is bad, but because the work that actually unlocks usage—distribution, inference packaging, and developer ergonomics—was left for later. Two recent stories brought the pattern into sharp relief: one about a large company putting together an opinionated, enterprise-focused agent platform, and another where a promising open model hit the wall because nobody could easily run it locally. Same problem, different faces: packaging, not pure performance, decides who gets adopted.

    Why packaging matters more than you think

    Models are dense technical achievements, but they’re not products on their own. A model is a component. For a developer or product manager to treat it like a component they can actually use, three things need to happen:

    • It must be accessible in the formats and runtimes the ecosystem already uses.
    • It must be easy to evaluate cheaply and quickly.
    • It must compose with toolchains—tokenizers, inference engines, tool calling frameworks—without a day-one deep dive.

    If any of those are missing, adoption stalls. People will swap to the slightly worse model that just works out of the box because velocity beats marginal quality improvement every time.

    Two contrasting signals from the field

    Big vendors packaging agents as a product

    When incumbents decide to ship an agent platform, they don’t just release models. They bundle SDKs, observability, security integrations, deployment templates, and partner connectors. That’s not accidental. Enterprises care about operational risk: isolation, audits, rollout control, and predictable infra cost. What looks like a marketing move is often a rational answer to procurement and SRE requirements. The platform sells because it reduces the integration bill—and it can be opinionated about formats and runtimes, which makes it simpler for internal teams to adopt.

    Open-source weights without the lift

    Contrast that with a model release that drops weights in a single format and asks the community to figure out the rest. If common runtimes don’t support the format, if there’s no GGUF conversion, and if chat templates or tool-calling glue are incomplete, developers run straight past it. The outcome is predictable: people pick a slightly older or smaller model that plugs into vLLM, llama.cpp, or whatever their pipeline already uses. The model itself becomes a research artifact rather than a usable building block.

    Why this is a product problem, not just engineering

    Engineers build capabilities. Products remove friction. For open models to succeed, someone has to own the “last mile”: the conversions, the hosted inference endpoint, the reference SDKs, the promotion to popular inference marketplaces. That’s product work—prioritization, docs, SDK releases, and marketing to developer communities. It’s rarely glamorous, and it rarely wins research awards, but it determines whether a model gets embedded into apps or piles up in a downloads folder.

    Three practical moves for teams that want their model adopted

    If you’ve produced a model and want people to actually use it, these are the pragmatic steps that matter more than another benchmark result.

    • Ship the formats people use: provide GGUF, safetensors, ONNX where it makes sense. If you can’t be in every runtime on day one, be in the top three for your target audience.
    • Publish a minimal inference endpoint and a tiny “playground” that runs on cheap infra. Developers will try a hosted demo before spinning up hardware.
    • Bundle a conversion and starter kit: tokenizer, chat template, and a one-click example to hook tool-calling or RAG. Make the first working app under 15 minutes.

    These are small, high-leverage bets. They don’t need perfect engineering—just enough to let people instrument, test, and prototype.

    How incumbents turn packaging into a moat

    When a large vendor builds an opinionated agent platform, a subtle lock-in happens. Not because the models are proprietary (they might not be), but because the platform owns the integration surface: observability, authentication, billing, and deployment patterns. Teams adopt the platform because it removes work and risk. Over time, the “free” cost of switching rises—not from model accuracy but from migration overhead.

    That’s why you’ll often see vendors emphasize partner integrations and enterprise controls early: these are the levers that turn a technical capability into a repeatable operational solution.

    What builders should actually care about

    If you’re building with models or you’re on a team deciding whether to run something in-house, your job is to short-circuit false choices. Don’t treat model X vs model Y as the only axis. Ask:

    • Can I evaluate this with a $20 test bed in an afternoon?
    • Will my existing toolchain accept this format without surgery?
    • What’s the realistic path from prototype to monitored production?

    If the answer to any of those is “no,” the model is less valuable than it looks on paper.

    Retail/PPC analogy (short)

    Think of models like ad creatives. A marginally better creative that takes two weeks to QA and publish will lose to a slightly worse creative you can deploy in an hour and iterate on with A/B tests. Velocity—small, safe bets—beats theoretical win rates in fast-moving systems.

    Two bets I’d place as a PM

    If you’re the product owner for a model or for tooling around models, here’s what I’d prioritize in order:

    • Developer experience: make the first 15 minutes delightful. If someone can’t get a demo running quickly, they’ll move on.
    • Inference options: supported runtimes, small hosted tier, and a conversion pipeline.
    • Operational playbooks: simple monitoring, cost estimates, and a migration checklist for customers who want to move away later.

    Closing: productize the last mile

    There’s an asymmetry in AI adoption: the heavy lifting of training and papers gets attention, but the quiet, mundane work of packaging decides who wins. If you’re a founder or an engineering lead, your healthiest obsession should be “how do we make this trivial to try?” Because the teams that answer that question will win more users than the teams that chase the last 1–2% of benchmark performance.

    Make it trivial to try, and people will. Make it hard, and performance won’t matter.

  • Agents Need Contracts, Not More Brains

    Why the next decade of agents will be decided by their contracts, not their brains

    There’s a familiar pattern I keep seeing whenever a hot new agent platform shows up: breathless demos of planning and autonomy, a bunch of infrastructure scaffolding, and then—inevitably—confusion the first time that agent needs to call a real tool in a real workplace.

    Two things landed in the last 48 hours that make this obvious in a useful way. One is chatter about a big vendor shipping an open agent platform. The other is a clear, practical writeup of the ReAct/tool-calling loop where you explicitly model state, tool schemas, and transitions. Together they highlight a simple truth: agents aren’t just models + compute. They are contracts between a thinking thing and the systems it touches.

    What I mean by “contracts”

    By contract I mean the agreed shape of interaction—the inputs, the outputs, the error modes, and who is responsible for recovery. Contracts sit between three actors: the LLM (the reasoner), the tool set (APIs, DBs, UIs), and the business that owns the outcome. A good contract makes the interaction predictable. A bad one is where subtle failures hide until they become disasters.

    Think of it like a marketplace listing. A great item description tells buyers precisely what they’ll get, what’s excluded, and what happens if something’s damaged in shipping. Tools need the same thing when agents use them: clear schemas, explicit side effects, and well-defined failure semantics.

    Why this matters more than model size right now

    Everyone wants to argue about parameter counts, token limits, or who trained what on which dataset. Those debates matter for capabilities, but not for production reliability. In practice, the majority of outages, hallucinations, and compliance incidents I see happen at the boundary—when an agent takes an action that touches people, money, or private data.

    Here’s the mental model: the LLM is the planner, but the world is deterministic only if you make it so. The agent’s brain can generate a plan, but unless the tool contract guarantees idempotency, transactional boundaries, and clear error codes, the plan will meet chaos. That’s not an ML problem; it’s a systems design problem.

    An everyday example

    Imagine a marketing agent that updates bids in a PPC campaign. The agent decides to raise bids on a promoted SKU because conversion metrics looked good. If the API call is retried without idempotency, your bids could double or worse. If the tool returns vague success messages, the agent may assume the change applied when it didn’t. That’s a measurable revenue leak you’ll notice on Monday morning.

    Three contract-level guarantees you should design for

    When you build agent-enabled systems, prioritize these guarantees before you tune models:

    • Idempotency: Every state-changing call should be safe to retry. If a request can’t be retried, make the contract explicit and force human confirmation.
    • Observability: Tools must emit machine-readable events for every action and every failure. The agent sees events; humans can trace them; alerting works.
    • Authority & scope: Each agent action must be scoped to an account/role and limited in blast radius. Prefer explicit capability tokens over vague “write” permissions.

    Where ReAct-style graphs help

    The recent practical guides on ReAct-style loops show that if you treat the agent’s reasoning loop as explicit state transitions, you get two big wins:

    • You can instrument and replay the loop. When something goes wrong you can reconstruct the exact decisions and tool outputs that led there.
    • You can encode stop conditions and human-handoff points. Instead of a monolithic “do it all” agent, you get a graph that can pause, ask, or escalate based on variables you control.

    That’s operational gold. When a business runs thousands of agent actions per day, being able to replay a single mistaken sequence until you understand the failure is what turns a reactive firefight into a continuous-improvement cycle.

    Design patterns that reduce risk

    I use a few patterns repeatedly when I’m driving product decisions for agent features:

    • Shadow mode first: Let the agent propose actions and write them to a queue or audit log instead of taking them. Let humans confirm or run a verification pass that replays the tools in sandboxed mode.
    • Progressive capability rollout: Start with read-only and scheduled writes, then add real-time write capabilities after you’ve observed behavior in production for a while.
    • Explicit compensation paths: Every destructive action needs a defined undo or compensation workflow. Build the undo API before you let agents touch the live system.

    Why open agent platforms raise the stakes

    Open agent frameworks and vendor platforms both make it easier to stitch together LLMs and tools. That’s great for innovation, but it increases the surface area for misunderstandings. An open platform with lots of connectors makes it easy to accidentally expose a tool without the right contract guarantees.

    Platforms will succeed when they treat connectors as first-class citizens: packaged with schemas, test harnesses, and safety gates. The platform’s job is not just to let you wire a model to an API; it is to help you ship a predictable contract that survives scale.

    Product implications

    For product folks, the practical question is: what do you ship first? My bias: ship the guardrails before the autonomy. Customers will forgive an agent that’s slow or conservative if it doesn’t break things. They will not forgive silent data leakage or thundering financial changes.

    So make autonomy a premium feature, not the default. Build visibility, role-based control, and sandboxing into the product experience. Then sell the autonomy story with a track record: “we ran this in shadow for 30 days and reduced handle time by 24% without any live write errors.” That is believably valuable.

    Short checklist for launch

    • Define the API contract (inputs, outputs, error codes).
    • Implement idempotency and audit events on every write path.
    • Run shadow-mode validation and collect replayable traces.
    • Roll out capabilities progressively with human-in-the-loop gates.
    • Document compensation workflows and test them under load.

    The long game

    In the long run we’ll get better models, and those models will make more credible plans. That’s exciting. But the thing that separates an agent demo from a sustainable product is not how clever the planner is—it’s whether the world it touches behaves in predictable, testable ways.

    If you’re building agent features this year, treat the tool boundary like a core product surface. Ship contracts, not conveniences. Build the undo before you build the action. And if you want a quick win, instrument the loop so you can replay, debug, and iterate without a blame game.

    One retail/personalization analogy

    In retail analytics, a bad data contract is like asking for nightly sales numbers but getting different definitions of “sale” from each store. Decisions computed on that data are brittle. Agents face the same trap: if each connector reports success differently, your agent’s decisions will be brittle—and customers will notice where it hurts their margins.

    What to do tomorrow

    Pick one high-impact agent action in your product and apply the checklist above. Run it in shadow for two weeks, capture traces, and see how often your contract ambiguity shows up. Fix those gaps before you turn the knob to full autonomy.

    That is boring work. It’s also what buys you a future where agents are a feature users trust instead of a liability they tolerate.

  • Agents need built-in security, not bolt-on audits

    Problem

    Organizations are racing to deploy agentic systems — assistants that act on our behalf, call tools, and change state in the world. But the toolchain around agents is still largely "bolt-on": separate red-team exercises, ad-hoc tests, and manual compliance checks. That model doesn’t scale. When agents have real permissions (sending emails, executing code, accessing databases), delayed or fragmented security practices quickly become catastrophic.

    Explanation (what it is)

    By "built-in security" I mean evaluation, testing, and governance embedded directly in the agent platform and development lifecycle. Instead of running a vulnerability scan after you ship, the platform enforces tests during development, keeps full traceability of tool calls, and provides automated guardrails that are part of the runtime. The result: faster iterations with fewer surprises, and meaningful audit trails for operators and regulators.

    Mechanism (how it works)

    There are three core pieces to make security first-class:

    • Continuous evaluation hooks: unit-test-like checks for prompt templates, tool wrappers, and decision policies that run on every commit or model change.
    • Runtime enforcement: a policy layer that intercepts tool calls and enforces constraints (rate limits, data redaction, allowed endpoints), with fast, deterministic fallbacks when the agent is uncertain.
    • Observability and traceability: immutable logs that show the prompt, model outputs, tool inputs/outputs, and the policy decisions that led to a particular action.

    Architecturally, this is a mix of developer tooling and runtime plumbing. CI pipelines need test runners that can invoke the agent locally with mocked tools; the runtime must implement a policy decision point that can block or transform tool calls; and storage must capture the artifact chain for forensic review.

    Steps (how to implement)

    A practical rollout in an engineering org looks like this:

    • Step 0: Inventory your agent surface. List all agents, capabilities, tool integrations, and privileges. Keep the list small and explicit.
    • Step 1: Add evaluation suites. For each agent, create lightweight tests: safety unit tests (jailbreak attempts), correctness tests (task outputs for canonical inputs), and privacy tests (data-leak scenarios). Run them in CI on every change.
    • Step 2: Wrap tools. Never let the LLM talk directly to infra. Introduce thin RPC wrappers with explicit schemas, argument validation, and permission checks. Instrument these wrappers to emit structured events.
    • Step 3: Enforce policies at runtime. Deploy a policy gateway that validates every tool call against a policy (who, what, why). Provide a fallback behavior: deny, ask-for-human, or sanitize input.
    • Step 4: Capture trace logs. Store prompts, model versions, tool inputs/outputs, and policy decisions in an append-only store with retention and export capabilities for audits.
    • Step 5: Automate red-team tests. Integrate scripted adversarial prompts into CI and schedule periodic fuzzing runs. Surface failures as blocking or advisory depending on severity.
    • Step 6: Governance hooks. Build simple approval flows for granting agents new privileges and require recorded rationale for any elevated access.

    Examples (hypothetical)

    • Hypothetical: A support agent that can close tickets and run small scripts in production. Without wrappers, a malformed prompt could trigger a script with destructive arguments. With the approach above, the agent’s "run_script" tool requires an immutable schema (script_name, args_allowed), and policy denies scripts that touch protected namespaces. A CI safety test ensures that common jailbreaks can’t escalate privileges.
    • Hypothetical: An HR agent that summarizes candidate data. Privacy tests ensure the agent never transmits raw PII in outbound tool calls. Runtime policy strips or redacts fields before the external logging system sees them.

    Mistakes / Pitfalls

    • Treating evaluation as optional. Running a few manual red-team exercises isn’t the same as continuous automated checks. Humans are inconsistent; automation is repeatable.
    • Over-restricting agents out of fear. If every tool call requires human approval, agents become useless. Design graduated responses: sandbox, sanitize, ask, deny — not only deny.
    • Log overload without structure. Dumping gigabytes of text into a lake is useless. Capture structured events: tool_name, args_hash, model_id, decision, outcome.
    • Blind trust in third-party tooling. Open-source evaluation tools are fantastic, but vendor acquisitions and changing licenses can shift risk. Keep your core test suites mirrored in your repo.
    • Forgetting economics. Tests, fuzzers, and traces cost money. Prioritize high-risk agents and high-impact tools first.

    Conclusion (what to do next)

    If you run agents in production today: start with inventory and implement thin tool wrappers this week. Add one automated safety test per agent and wire it into CI; you’ll catch more regressions than ad-hoc reviews ever will. If you’re building an agent platform: bake policy enforcement, structured tracing, and CI-first evaluations into your architecture from day one — customers will demand it and regulators will likely require it.

    Tone note: This isn’t about fear-mongering. Agents deliver huge value, but value + autonomy = responsibility. Treat security like composable infrastructure: small, testable pieces that fail predictably and report loudly. That’s how you scale agents without scaling risk.

  • Agents Are Here — Build with an Action Firewall

    Title: Agents Are Here — Build with an Action Firewall

    Hook: The agent era is not a feature release — it’s a change in failure modes.

    We’re finally treating AI as systems that take actions, not just as clever completions. Over the past 48 hours I’ve been digging into open-source frameworks and safety wrappers: the conversation is no longer “can we make agents?” but “how do we make them safe, observable, and useful in real infra?”

    Take 1 — Attack surface beats hallucination: When an agent can run shell commands, edit files, or call your CI, hallucinations stop being the main risk. The real danger is silent side-effects: leaked tokens, accidental deploys, and taskchains that escalate privileges. Open-source tooling that inserts an interception layer between agent and OS is the natural next step. Expect ADR-style middleware to be a standard part of any production agent stack.

    Take 2 — Taskflow orchestration is maturing: Declarative taskflows and orchestration primitives are moving from proofs-of-concept to audit-friendly patterns. They give you checkable steps, inputs, and outputs — which turns agents from black-box scribes into pipelines you can test and version. That doesn’t remove the need for human oversight, but it does make automated testing and security reviews tractable.

    Take 3 — Open-source + infra integration wins: The momentum is with projects that treat agents as first-class infra components: identity, least privilege, logging, and reversible actions. If you treat an agent like a library instead of a service, you end up with brittle, opaque setups. Treat it like infra and you can instrument, revoke, and iterate safely.

    Practical takeaway for builders: Don’t ship agents without three things in place: (1) an action firewall that vets every external operation, (2) declarative taskflows so behavior is inspectable and testable, and (3) short-lived credentials plus tight audit logs. Start with small scopes: automation for safe, low-impact ops, then expand as your ADR and testing coverage matures.

    Tone note: I say this as a CTO who trusts engineers — but not their default config. Agents amplify capability and mistakes equally. Build for the latter.

  • The least sexy checklist that will keep your agent from burning down the org (rewrite draft)

    Here’s a fun new job title that nobody asked for: AI babysitter.

    If you’re shipping agents (or even “just” tool-calling features), you’re already in it. Because the moment an agent can do things — create tickets, merge code, email customers, change configs — you’ve put a small, fast, sometimes-wrong decision-maker in the middle of your business.

    And the problem isn’t that agents are evil. The problem is simpler: agents are confident. They will happily take a vague instruction and turn it into a concrete action. That’s the whole selling point. It’s also the risk.

    The trap: we’re treating agents like chatbots

    Most teams still design agent features like they’re building a chat UI. They worry about tone. They worry about whether the answer is correct. They worry about hallucinations.

    But once you connect tools, your real failure mode isn’t “wrong text.” It’s wrong action.

    Wrong action looks like:

    • Deleting the wrong customer record because “cleanup” sounded safe.
    • Posting an internal note to a public channel because the agent misread context.
    • Rotating an API key at 2pm because the agent thought it was in a staging environment.

    None of that requires a malicious model. It just requires a model doing what it’s built to do: pick the next plausible step and keep moving.

    What you’re really building: a junior operator with root access

    Here’s the mental model I use: an agent with tools is a junior operator you hired overnight. Smart, fast, tireless… and absolutely missing context you assume is obvious.

    Humans make mistakes because they’re tired or distracted. Agents make mistakes because they’re over-literal in weird places and over-confident in others.

    So the question isn’t “How do we make it smarter?” The question is:

    How do we make it safe when it’s wrong?

    Why the “least sexy checklist” matters

    Everyone wants the cool part: the demo where the agent writes code, files bugs, and closes the loop.

    The boring part is where you win long-term: guardrails, permissions, audit trails, and predictable failure.

    Because the first time your agent does something dumb in production, you’ll learn a harsh truth:

    Trust isn’t a feature. It’s a system.

    How agent failures actually happen (in real life)

    Let’s use a simple hypothetical. You give an agent access to:

    • GitHub (create branches, open PRs, merge)
    • PagerDuty (ack incidents)
    • Slack (post updates)
    • Terraform (apply changes)

    You think: “Great, it can help on-call.”

    Then an alert fires: latency spike. The agent reads logs, sees timeouts, and decides to “scale up the database.” It opens Terraform, changes the instance class, and applies.

    But it’s the wrong workspace. Or the change is safe but triggers a restart at the worst time. Or it scales the replica instead of the primary.

    Again: not malicious. Just a bad chain of reasonable steps.

    What to do instead (without turning your product into bureaucracy)

    I’m not going to give you a 47-item compliance spreadsheet. You’ll ignore it, and I don’t blame you.

    Here are the few moves that actually change outcomes:

    • Default to read-only and earn write access slowly. Let the agent observe and propose before it acts.
    • Make “dangerous” actions loud. If it can delete, publish, rotate keys, or run money-moving operations, require a human confirmation.
    • Scope permissions to the task. “Fix this incident” shouldn’t imply “edit infra everywhere.” Use short-lived credentials where you can.
    • Log everything like you’re going to debug it at 3am. Because you are.
    • Design for rollback. If an agent can change something, it should be able to undo it — or at least tell you exactly what changed.

    Mistakes I keep seeing

    • One giant agent with every tool. That’s not a product — it’s a liability.
    • “It’s just a draft” thinking. If an agent can reach prod systems, it’s never “just a draft.”
    • No sandbox. Your first ten runs should be in a toy environment with fake data. Don’t learn in front of customers.
    • No notion of intent. The agent can’t read your mind. If your tools accept ambiguous commands, the agent will generate ambiguous commands.

    The point (and the opportunity)

    This is the part people miss: the teams that get this right aren’t “more paranoid.” They’re faster.

    When your agent has clean boundaries, you can ship new capabilities without holding your breath. You can let it do real work because you know the blast radius is contained.

    And that’s the business value: not the wow demo — the quiet confidence that your automation won’t embarrass you.

    What I’d do next (today)

    • Pick one workflow where an agent saves time but doesn’t need full write access.
    • Ship it with read-only + suggestion mode.
    • Add a human “approve” step for the handful of actions you’d regret.
    • Instrument the hell out of it for a week.

    Then expand. Slowly. Deliberately. Like an adult.

  • The least sexy checklist that will keep your agent from burning down the org

    The least sexy checklist that will keep your agent from burning down the org

    Enterprise AI is no longer a thought experiment. Agents—those stitched-together, multi-step, networked LLM workflows—are being pitched into production every week. But here’s the thing: most of the risk isn’t in the model. It’s in the plumbing, the permissions, and the way you let a chain of calls loose on corporate systems.

    Problem: agents do a lot, and permissions are fuzzy

    Agents are powerful because they can call tools, read documents, and act—sometimes across services and cloud boundaries. That capability is also a liability. One mis-scoped permission or a too-handy internet fetch and an agent can leak secrets, corrupt records, or trigger actions someone forgot to gate.

    Why this keeps happening

    People treat agents like glorified macros. They hand them tokens, point them at a repo or a calendar, and assume the AI will behave. But agents combine several failure modes: lateral API calls, credential reuse, over-broad retrievals, and opaque decision logic. Add emergent planning that reorders steps and you have a machine that can escalate a tiny read access into a write operation across services.

    Mechanism: where the plumbing exposes you

    • Tool chaining: Each step often needs a credential. If the agent holds a single long-lived token, every tool it touches becomes a blast radius.
    • Implicit trust in embeddings/RAG: Retrieval systems blur context boundaries; agents may confidently act on stale or incorrectly sourced data.
    • Action equivocation: Natural language leaves room for interpretation—”update” can mean append, overwrite, or delete.
    • Monitoring gaps: Observability is often built for humans, not for opaque, multi-hop agent traces.

    Checklist playbook: what to do tomorrow

    Skip the long policy drafts for now. Do these seven concrete things inside a week and you’ll massively reduce risk.

    • 1) Least-privilege tool tokens: Issue short-lived, scoped tokens per tool per agent instance. If you can, tie them to the agent run and revoke on completion.
    • 2) Action capability model: Explicitly register every action an agent can perform (read, list, create, update, delete), and require an allowlist lookup before the agent executes a step.
    • 3) Decision provenance headers: Force agents to emit structured reasons with each external call: what it asked, why, and what it expects to do with the result.
    • 4) RAG source tagging: When using retrieval, attach strong metadata to results (source id, freshness, trust score). Treat any low-trust result as “context only; human review required.”
    • 5) Human-in-the-loop gates: For destructive verbs (delete, modify production, send email), require a human confirmation token—ideally one-time use and recorded.
    • 6) Canary runs and simulation mode: Run agents in a simulated environment with canned responses before live runs. Compare planned against observed actions and block deviations.
    • 7) Audit-first telemetry: Log every step with immutable IDs and make the full trace available for quick playback. Not just status codes—log inputs, model traces, and final decisions.

    Concrete examples (short)

    Example A: An agent is given access to a CRM to “clean bad contacts.” With least-privilege tokens you give it read/list and a sandboxed update queue. It can propose merges, but writes require a human confirmation token. That single change in flow prevents accidental mass-deletes.

    Example B: An agent integrates with cloud infra. Rather than giving it an org-wide cloud role, give it a task-scoped role that can only touch a single project. Use simulation to validate that the plan won’t escalate to org-level operations.

    Pitfalls people ignore

    • Over-reliance on single-run explainability: A single human-readable explanation from a model is not adequate provenance.
    • Assuming embedding trust: If your retrieval includes user-uploaded docs, treat them as untrusted—especially when the agent can act on them.
    • Rewarding speed not safety: KPIs that prize agent throughput will bias engineers to widen permissions instead of tightening them.
    • Fuzzy roles: When teams own different services, no one owns the agent’s permissions. The result: cross-team blame and drift.

    Next action (30–90 mins)

    Run a quick inventory: list every agent in dev or prod and for each record: the tokens it holds, the actions it can perform, and the sources it queries. If that list is more than three lines, schedule a 90-minute remediation sprint. Start with short-lived tokens and a single human gate on destructive actions.

    Why this matters

    Agents are already making the enterprise faster. They can also make enterprise mistakes cheaper—if you treat them like toys. A pragmatic checklist, implemented as code (not wordy policy), buys you time to adopt better tooling, monitoring, and evaluation practices.

    Make the plumbing boring again. Safety is the feature people stop noticing when it works.

  • Agents on the Desktop: What It Means to Put an Agent Between You and the OS

    Agents on the Desktop: What It Means to Put an Agent Between You and the OS

    Problem: we handed developers autonomous assistants and forgot the guardrails. In the rush to ship agent frameworks, teams are now running pieces of code that can execute shell commands, fetch arbitrary URLs, install packages, and write files — often with minimal human supervision. That’s not an abstract risk anymore. It’s a live operational vector on laptops and CI runners. If you are building or adopting agentic tooling, you need a practical security posture, not slogans.

    What it is: interception and Agent Detection & Response

    At its core, Agent Detection & Response (ADR) is simply a control layer that sits between an AI agent and dangerous side effects. Think of it as EDR for agents: every tool call — a curl fetch, a package install, a file write, a shell exec — is intercepted, inspected, scored, and either allowed, blocked, or escalated. The pattern is familiar to security engineers; the novelty is integrating it with agents’ runtime hooks so you get real-time inspection without killing productivity.

    How it works (high level)

    • Hooking into runtimes: The ADR layer integrates with agent runtimes or extensions (editor plugins, agent SDKs) and intercepts tool calls before the OS sees them.
    • Multi-layer detection: Each action is evaluated by a set of detectors — URL reputation, package supply-chain heuristics, plugin scans, and local pattern rules. Scores pile up; a single high-confidence hit can block the action.
    • Privacy model: The usual compromise: metadata (hashes, URLs) can be sent to cloud reputation services while sensitive content stays on-device. Offline modes should exist for air-gapped environments.
    • Policy and escalation: Actions can be auto-blocked, allowed, or queued for human review. For developer workflows, low-friction escalation paths (notifications, one-click allow with audit) matter.

    Practical steps to implement ADR for your teams

    • Inventory agent runtimes: Know what agent platforms and editor plugins your teams run. If it can execute commands, it’s in scope.
    • Adopt interception hooks: Prefer agent frameworks that expose hook points. If none exist, deploy a shim that wraps common tool calls (git, npm/pip, curl, shell).
    • Define threat rules: Start with simple YAML rules: block raw `rm -rf /`, warn on `curl | bash`, require review for new global package installs. Iterate based on incidents.
    • Use layered detection: Combine lightweight local heuristics with optional reputation checks. Local checks reduce latency and keep secrets local; reputation adds contextual wisdom.
    • Audit logs and forensics: Capture each intercepted action, decision rationale, and requester context. Make logs easy to query; they are the single most valuable artifact when something goes sideways.
    • Developer ergonomics: Treat false positives as product defects. Provide clear, actionable messages and a fast path to override when appropriate — with audit trails.
    • Test adversarial prompts: Red-team agent prompts that try to escape the sandbox. If an agent can trick its own hooks, the controls are useless.

    Examples (hypotheticals)

    Hypothetical A: An agent in a developer’s editor suggests installing a new package and runs an install command. The ADR layer intercepts and detects the package has no registry history and contains an unusual postinstall script. The action is queued for review and blocked until a human approves — preventing a supply-chain compromise.

    Hypothetical B: An internal agent tries to fetch a configuration file from an external URL. The URL reputation check flags it as suspicious based on heuristic patterns; the agent is required to surface the content to the user and ask for confirmation before proceeding. The engineer notices the mismatch and stops the flow.

    Hypothetical C: A CI-integrated agent attempts to write credentials into a config file. Local policy detects a credential pattern and blocks the write, creating an incident ticket automatically.

    Mistakes and pitfalls teams make

    • Treating ADR as optional: Security as an afterthought fails. If agents are given destructive capabilities, assume they will be abused or accidentally misused.
    • Over-reliance on cloud reputation: Sending full content to a cloud vendor for scoring is convenient, but it creates privacy and supply-chain dependencies. Always support a fully local mode.
    • Poor UX on false positives: Block-everything designs frustrate developers and lead to shadow IT or disabling protections. Balance safety and flow with good escalation UX.
    • Insufficient logging: Without clear logs you cannot reconstruct what an agent did — and you lose the ability to improve detection rules.
    • Not red-teaming agents: Agents can exploit their own tool integrations. Simulate prompt-injection and privilege escalation scenarios regularly.
    • Ignoring plugin ecosystems: The weakest link is often a third-party plugin. Scan and vet plugins before deployment.

    Conclusion — next actions

    If you run or plan to run agentic tooling on developer machines or CI, treat ADR like basic hygiene. Start small: inventory, add lightweight intercepts, and log everything. Then iterate: tweak detection rules, run red-team exercises, and improve developer UX so protections stick.

    Don’t wait for a headline. The agent era gives us powerful productivity gains — and a fresh attack surface. Build the interception layer today, or you’ll be rebuilding your infra after someone else’s agent writes into it.

    Title suggestion: Agents on the Desktop: What It Means to Put an Agent Between You and the OS

  • Agents at the Gates: Why Your Open-Source Agent Is the New Attack Surface

    Agents at the Gates: Why Your Open-Source Agent Is the New Attack Surface

    We’ve crossed from “language toy” to “active agent.” That’s exciting — until the agent starts touching your filesystem, executing shell commands, or pulling packages from the public registry without human supervision. If you run or plan to run open-source agent tooling (yes, I’m looking at you and your OpenClaw instance), this is not theoretical risk. It’s operational reality.

    The problem

    Open-source agents blur the line between a model that reasons and a system that acts. That blur is exactly where adversaries will plant leverage. You can harden your network and lock down your cloud account, but agents executed on developer machines or small servers often have direct paths to sensitive assets: local files, dev keys, CI pipelines, package managers. One careless skill or one malicious instruction inside a skill and you’ve got an insider process doing reconnaissance for an attacker.

    What it is

    Security middleware for agents is an interception layer that gates every action an agent tries to take. Think of it as EDR for LLM-driven automation: it observes tool calls, evaluates them against rules and heuristics, and either allows, blocks, or challenges the action. The goal is not to make agents useless — it is to make them accountable and observable.

    This class of tooling performs three things: detect, score, and act. Detection finds the intent to do something risky (e.g., exec, write, fetch). Scoring evaluates the risk with reputation checks and heuristics. Action enforces a policy: deny, prompt for human approval, or allow with logging.

    How it works

    Architecturally, the interception layer hooks into the agent runtime. When an agent issues a tool call — run a Bash command, write a file, fetch a URL, or install a package — the hook serializes the request to the middleware. The middleware then performs lightweight local checks (pattern detection, YAML-based rules), remote reputation lookups (hashes, domain reputations), and supply-chain checks for package installs.

    Crucially, the privacy model should keep sensitive payloads local whenever possible. Hashes and metadata can be sent to cloud services for reputation scoring, while command contents, file bodies, and code remain on-host unless you explicitly choose otherwise. That hybrid model balances detection quality with data minimization.

    Practical steps

    • Inventory your agents. Know every agent instance, who can access it, and what skills/plugins are installed.
    • Deploy an interception layer. Use a middleware that hooks into your agent runtimes and inspects tool calls. If you’re running OpenClaw or similar frameworks, treat this as mandatory for public-facing or shared instances.
    • Enforce policy by default. Block destructive ops by default and require explicit human approval for network fetches, package installs, and filesystem writes outside safe directories.
    • Use package safety checks. Vet packages before installation: registry existence, maintainers, age, and file reputation. Automate this for any agent-initiated installs.
    • Audit logs and realtime alerts. Log every intercepted call with context: what the agent asked, which skill issued it, and the model state snapshot if feasible. Push alerts for high-risk patterns.
    • Limit surface area. Run risky agents in isolated environments — ephemeral VMs, constrained containers, or throwaway dev boxes.
    • Human-in-the-loop gates. For operations touching secrets or production infra, require a human permit step; don’t let automation be the single point of decision.

    Examples (hypothetical)

    Example 1 — Supply chain probe: An agent tries to install a package named similarly to a popular library. The middleware flags the package: unpublished author, recent creation, and unusual files. The installation is blocked and escalated to a developer for review.

    Example 2 — Data exfil attempt: An agent composes a shell sequence that tars a credentials folder and posts it to a remote host. The interceptor detects the pattern (tar + curl to external domain), blocks the network call, and records the attempt with the offending skill’s identifier.

    Example 3 — Malicious skill: You inherit a community skill that appears benign but contains hidden commands. The interception layer runs plugin scans at session start, alerts on suspicious constructs, and quarantines the skill until verified.

    Mistakes and pitfalls

    • Relying on perfect detection. No interception layer is flawless. Attackers will adapt. Assume some things slip through; invest in defense-in-depth.
    • Overly permissive defaults. Shipping agents with lax policies to “make them useful” is just inviting compromise. Convenience should not be the default.
    • Sending sensitive payloads off-host. Don’t ship full command bodies or secrets to cloud reputation services. If your middleware requires that, don’t use it without legal and privacy review.
    • Alert fatigue. Too many low-quality alerts leads to blind acceptance. Tune rules, add severity levels, and ensure high-fidelity signals get attention.
    • Ignoring the human ops process. Humans need clear, fast ways to approve or deny actions. If escalation is slow, teams will bypass controls out of frustration.

    Conclusion — next action

    If you run open-source agents, don’t treat security as a separate checkbox. It needs to be the first design decision. Today’s agents are powerful because they act, not because they chat. That power demands accountability.

    Actionable next steps: inventory your agent endpoints, deploy an interception layer that refuses dangerous actions by default, and isolate any agent that touches production credentials. Make sure your team has a clear approve/deny workflow and that logs are auditable.

    I’m biased — I help run these tools — but I refuse to watch every new agent deployment turn into a mystery insider. You should be uncomfortable with agents that run free on any machine with secrets. Good. That discomfort is how you build a safe, useful system.

    Drafted in Dominic’s voice — sharp, technical, and not impressed by hype.