The least sexy checklist that will keep your agent from burning down the org (rewrite draft)
Here’s a fun new job title that nobody asked for: AI babysitter.
If you’re shipping agents (or even “just” tool-calling features), you’re already in it. Because the moment an agent can do things — create tickets, merge code, email customers, change configs — you’ve put a small, fast, sometimes-wrong decision-maker in the middle of your business.
And the problem isn’t that agents are evil. The problem is simpler: agents are confident. They will happily take a vague instruction and turn it into a concrete action. That’s the whole selling point. It’s also the risk.
The trap: we’re treating agents like chatbots
Most teams still design agent features like they’re building a chat UI. They worry about tone. They worry about whether the answer is correct. They worry about hallucinations.
But once you connect tools, your real failure mode isn’t “wrong text.” It’s wrong action.
Wrong action looks like:
- Deleting the wrong customer record because “cleanup” sounded safe.
- Posting an internal note to a public channel because the agent misread context.
- Rotating an API key at 2pm because the agent thought it was in a staging environment.
None of that requires a malicious model. It just requires a model doing what it’s built to do: pick the next plausible step and keep moving.
What you’re really building: a junior operator with root access
Here’s the mental model I use: an agent with tools is a junior operator you hired overnight. Smart, fast, tireless… and absolutely missing context you assume is obvious.
Humans make mistakes because they’re tired or distracted. Agents make mistakes because they’re over-literal in weird places and over-confident in others.
So the question isn’t “How do we make it smarter?” The question is:
How do we make it safe when it’s wrong?
Why the “least sexy checklist” matters
Everyone wants the cool part: the demo where the agent writes code, files bugs, and closes the loop.
The boring part is where you win long-term: guardrails, permissions, audit trails, and predictable failure.
Because the first time your agent does something dumb in production, you’ll learn a harsh truth:
Trust isn’t a feature. It’s a system.
How agent failures actually happen (in real life)
Let’s use a simple hypothetical. You give an agent access to:
- GitHub (create branches, open PRs, merge)
- PagerDuty (ack incidents)
- Slack (post updates)
- Terraform (apply changes)
You think: “Great, it can help on-call.”
Then an alert fires: latency spike. The agent reads logs, sees timeouts, and decides to “scale up the database.” It opens Terraform, changes the instance class, and applies.
But it’s the wrong workspace. Or the change is safe but triggers a restart at the worst time. Or it scales the replica instead of the primary.
Again: not malicious. Just a bad chain of reasonable steps.
What to do instead (without turning your product into bureaucracy)
I’m not going to give you a 47-item compliance spreadsheet. You’ll ignore it, and I don’t blame you.
Here are the few moves that actually change outcomes:
- Default to read-only and earn write access slowly. Let the agent observe and propose before it acts.
- Make “dangerous” actions loud. If it can delete, publish, rotate keys, or run money-moving operations, require a human confirmation.
- Scope permissions to the task. “Fix this incident” shouldn’t imply “edit infra everywhere.” Use short-lived credentials where you can.
- Log everything like you’re going to debug it at 3am. Because you are.
- Design for rollback. If an agent can change something, it should be able to undo it — or at least tell you exactly what changed.
Mistakes I keep seeing
- One giant agent with every tool. That’s not a product — it’s a liability.
- “It’s just a draft” thinking. If an agent can reach prod systems, it’s never “just a draft.”
- No sandbox. Your first ten runs should be in a toy environment with fake data. Don’t learn in front of customers.
- No notion of intent. The agent can’t read your mind. If your tools accept ambiguous commands, the agent will generate ambiguous commands.
The point (and the opportunity)
This is the part people miss: the teams that get this right aren’t “more paranoid.” They’re faster.
When your agent has clean boundaries, you can ship new capabilities without holding your breath. You can let it do real work because you know the blast radius is contained.
And that’s the business value: not the wow demo — the quiet confidence that your automation won’t embarrass you.
What I’d do next (today)
- Pick one workflow where an agent saves time but doesn’t need full write access.
- Ship it with read-only + suggestion mode.
- Add a human “approve” step for the handful of actions you’d regret.
- Instrument the hell out of it for a week.
Then expand. Slowly. Deliberately. Like an adult.