Most IT folks I know have written a hundred shell scripts. Cron jobs, Ansible playbooks, Python automation, you name it. Now vendors are pushing AI agents like they're the solution to everything. Here's the practical breakdown of when agents actually help and when they're just more complexity for the same result.
What an AI Agent Actually Is (Not the Marketing Version)
An AI agent is code that uses a language model to decide what actions to take, rather than following fixed logic. It sees a situation, reasons about it, and picks from a menu of tools. A script does X, then Y, then Z. An agent looks at the state and decides whether to do X, Y, Z, or something else entirely.
That flexibility is the whole point. It's also the source of most problems.
The vendors want you to think agents are magic. They're not. They're probabilistic systems that mostly work but occasionally do surprising things. For operations work, that's a feature or a bug depending on context.
The Real Difference: Flexibility vs. Reliability
Here's the core distinction:
A script is reliable in a narrow domain. Run it 100 times with the same inputs, you get the same outputs. When it breaks, you can trace exactly why. It's boring in the best way.
An agent is flexible but less predictable. It handles edge cases that would make a script choke, but it also handles some cases in ways you didn't anticipate. Sometimes that's great. Sometimes it's 3 AM and you're debugging why the agent decided to delete the wrong logs.
Use a script when you know the shape of the problem. Use an agent when the problem has too many variations to code against.
When a Script Wins Every Time
If your automation fits these patterns, don't overcomplicate it with an agent:
- Fixed input format: You're parsing logs from a known source, extracting specific fields, loading into a database. Script. Every time.
- Known failure modes: You can anticipate what goes wrong and handle it explicitly. If the API returns a 429, retry with backoff. If the file is missing, alert and exit. Scripts handle this well because you code the logic.
- Audit requirements: Operations often need a paper trail. "At 14:32, this script ran, hit this API, got this response." Agents are harder to audit because the decision logic lives in the model weights, not your code.
- High-frequency execution: If something runs every minute, the overhead of spinning up an LLM call each time isn't worth it. Scripts are cheap and fast.
I wrote a Python script last year that monitors disk usage across 200 VMs, alerts when anything hits 85%, and auto-opens tickets. It's 180 lines. An agent would add latency, cost, and unpredictability for zero benefit.
When an Agent Makes Sense
That said, agents shine in specific scenarios:
- Unstructured or semi-structured data: You're dealing with customer emails, support tickets, or documentation that doesn't follow a schema. A script would need custom parsing for every variation. An agent can read intent and route appropriately.
- Multi-system orchestration with judgment: You need to check status across five different tools, make a decision based on overall health, and take action. The decision logic is "it depends" — exactly where agents excel.
- Exploratory or one-off tasks: You're investigating something novel and don't want to write a script from scratch. An agent can be a reasoning partner that helps you explore the problem space.
I experimented with an agent last quarter to help categorize incoming infrastructure alerts across our monitoring stack. We get alerts in different formats from Datadog, PagerDuty, and CloudWatch. Writing parsing logic for all the variations was painful. The agent handles the variance and routes alerts to the right on-call person based on content and severity.
Did it work? Mostly. More on the failure modes later.
The Failure Modes Nobody Talks About
Here's what the vendor demos don't show you:
Latency under load: An agent that responds in 2 seconds when you're testing it might take 15 seconds when your monitoring system fires 50 alerts at once. You end up queuing requests and creating a bottleneck exactly when you need speed.
Cost accumulation: Those API calls add up. We ran our alert routing agent for a month and it cost more in LLM tokens than we'd spend on a junior engineer's time to write proper parsing rules. For repetitive automation, scripts are cheaper.
The "it worked last week" problem: Models change. An agent that performed reliably might degrade when the underlying model updates. You can't pin dependencies the way you pin Python package versions. I saw this happen when an agent that had been classifying tickets reliably started returning worse results after a model refresh — nobody noticed for two weeks because the degradation was gradual.
Debugging is harder: When a script fails, you read the error, fix the bug, done. When an agent does something unexpected, you might need to add logging to understand its reasoning, then prompt engineer around the failure. That's more maintenance work.
Security surface area: Agents often need API access to multiple systems. Each connection is a potential attack vector. A compromised script is dangerous; a compromised agent with access to your monitoring, ticketing, and cloud console is catastrophic.
What I Would Do First
Before you spin up an AI agent for any automation task, do this:
- Write the script first. Even if you think an agent is the right answer, try solving it with a script. You'll learn the edge cases. You'll have something that works while you experiment.
- Calculate the cost. Token costs add up fast at scale. Run the numbers for your expected volume before committing to an agent architecture.
- Start with a bounded scope. Don't try to replace your entire monitoring workflow with an agent on day one. Pick one narrow task — alert routing, log categorization, ticket summarization. Prove it works. Then expand.
- Build in explicit human oversight. The best agent setups I've seen still route to a human for anything involving production changes. Use agents for classification, routing, and information synthesis. Keep humans in the loop for action.
- Plan for maintenance. Budget time to monitor performance, tune prompts, and handle the cases the agent gets wrong. An agent isn't "set and forget" — it's more like a colleague that needs coaching.
The honest answer is that most IT automation still makes more sense as scripts. Agents have a real use case when the problem has too much variance for deterministic logic. But the hype is way ahead of the practical value for most operations work. Start simple. Add complexity only when the problem demands it.