5 mins read

Agents need built-in security, not bolt-on audits

Title: Agents need built-in security, not bolt-on audits

1) Problem
Organizations are racing to deploy agentic systems — assistants that act on our behalf, call tools, and change state in the world. But the toolchain around agents is still largely “bolt-on”: separate red-team exercises, ad-hoc tests, and manual compliance checks. That model doesn’t scale. When agents have real permissions (sending emails, executing code, accessing databases), delayed or fragmented security practices quickly become catastrophic.

2) Explanation (what it is)
By “built-in security” I mean evaluation, testing, and governance embedded directly in the agent platform and development lifecycle. Instead of running a vulnerability scan after you ship, the platform enforces tests during development, keeps full traceability of tool calls, and provides automated guardrails that are part of the runtime. The result: faster iterations with fewer surprises, and meaningful audit trails for operators and regulators.

3) Mechanism (how it works)
There are three core pieces to make security first-class:
– Continuous evaluation hooks: unit-test-like checks for prompt templates, tool wrappers, and decision policies that run on every commit or model change.
– Runtime enforcement: a policy layer that intercepts tool calls and enforces constraints (rate limits, data redaction, allowed endpoints), with fast, deterministic fallbacks when the agent is uncertain.
– Observability and traceability: immutable logs that show the prompt, model outputs, tool inputs/outputs, and the policy decisions that led to a particular action.

Architecturally, this is a mix of developer tooling and runtime plumbing. CI pipelines need test runners that can invoke the agent locally with mocked tools; the runtime must implement a policy decision point that can block or transform tool calls; and storage must capture the artifact chain for forensic review.

4) Steps (how to implement)
A practical rollout in an engineering org looks like this:
– Step 0: Inventory your agent surface. List all agents, capabilities, tool integrations, and privileges. Keep the list small and explicit.
– Step 1: Add evaluation suites. For each agent, create lightweight tests: safety unit tests (jailbreak attempts), correctness tests (task outputs for canonical inputs), and privacy tests (data-leak scenarios). Run them in CI on every change.
– Step 2: Wrap tools. Never let the LLM talk directly to infra. Introduce thin RPC wrappers with explicit schemas, argument validation, and permission checks. Instrument these wrappers to emit structured events.
– Step 3: Enforce policies at runtime. Deploy a policy gateway that validates every tool call against a policy (who, what, why). Provide a fallback behavior: deny, ask-for-human, or sanitize input.
– Step 4: Capture trace logs. Store prompts, model versions, tool inputs/outputs, and policy decisions in an append-only store with retention and export capabilities for audits.
– Step 5: Automate red-team tests. Integrate scripted adversarial prompts into CI and schedule periodic fuzzing runs. Surface failures as blocking or advisory depending on severity.
– Step 6: Governance hooks. Build simple approval flows for granting agents new privileges and require recorded rationale for any elevated access.

5) Examples (hypothetical)
– Hypothetical: A support agent that can close tickets and run small scripts in production. Without wrappers, a malformed prompt could trigger a script with destructive arguments. With the approach above, the agent’s “run_script” tool requires an immutable schema (script_name, args_allowed), and policy denies scripts that touch protected namespaces. A CI safety test ensures that common jailbreaks can’t escalate privileges.

– Hypothetical: An HR agent that summarizes candidate data. Privacy tests ensure the agent never transmits raw PII in outbound tool calls. Runtime policy strips or redacts fields before the external logging system sees them.

6) Mistakes / Pitfalls
– Treating evaluation as optional. Running a few manual red-team exercises isn’t the same as continuous automated checks. Humans are inconsistent; automation is repeatable.
– Over-restricting agents out of fear. If every tool call requires human approval, agents become useless. Design graduated responses: sandbox, sanitize, ask, deny — not only deny.
– Log overload without structure. Dumping gigabytes of text into a lake is useless. Capture structured events: tool_name, args_hash, model_id, decision, outcome.
– Blind trust in third-party tooling. Open-source evaluation tools are fantastic, but vendor acquisitions and changing licenses can shift risk. Keep your core test suites mirrored in your repo.
– Forgetting economics. Tests, fuzzers, and traces cost money. Prioritize high-risk agents and high-impact tools first.

7) Conclusion (what to do next)
If you run agents in production today: start with inventory and implement thin tool wrappers this week. Add one automated safety test per agent and wire it into CI; you’ll catch more regressions than ad-hoc reviews ever will. If you’re building an agent platform: bake policy enforcement, structured tracing, and CI-first evaluations into your architecture from day one — customers will demand it and regulators will likely require it.

Tone note: This isn’t about fear-mongering. Agents deliver huge value, but value + autonomy = responsibility. Treat security like composable infrastructure: small, testable pieces that fail predictably and report loudly. That’s how you scale agents without scaling risk.