Agents vs. Workflows: Where Does the ROI Actually Live?

Yesterday, I had the privilege of joining CDO Magazine at an executive dinner and roundtable in Chicago, where I served as a panelist and took part in the discussion. The room was full of executives from some of the largest enterprises in the Midwest. The conversations were candid, energetic, and informative. These are smart, experienced leaders who aren’t waiting to see if AI agents matter —they’re already acting.

One moment stood out above the rest.

A VP of Data and AI described how their team had used a low,, no-code agent builder to deploy over 9,000 agents in just a few weeks. The room responded with impressed murmurs. And honestly, for a moment, so did I.

Then the question hit me: What are those 9,000 agents actually doing?

The enterprises that win with AI agents won’t be the ones who built the most. They’ll be the ones who built the right ones.

The impressive number to worry about

There’s nothing inherently wrong with deploying agents at scale. The ability to build and deploy agents quickly is genuinely valuable, and the low-code platforms making this possible are impressive engineering achievements. I should know. I’m building one myself.

But “agent” has become a term of art that enterprises are applying too broadly, and that misapplication has real consequences. I didn’t have the opportunity to ask what those 9,000 agents were actually doing, but based on the sheer number, my guess is that the answers would be familiar: routing documents, triggering notifications, summarizing reports, calling APIs on a fixed schedule, and extracting fields from structured forms.

These are automation workflows. They’re useful, but they aren’t agents.

The distinction matters because the architecture, economics, and governance of true agentic systems differ fundamentally from those of deterministic automation. Confusing the two leads enterprises to the wrong investments, wrong expectations, and real cost exposure.

What’s the actual cost?

A simple automation task running through an LLM at scale—for example, 50,000 document-routing decisions per day— can cost 10 to 50 times more in token costs than a rule-based classification system that takes an afternoon to build. Multiplied across 9,000 “agents,” token costs can quietly grow into a material budget line item within months.

Defining the spectrum: from automation to agency

Not every AI-powered workflow needs to be an agent. The key is understanding what you actually have and whether it matches what you actually need. At Oracle, when we think about the AI Data Platform and how enterprises deploy agentic systems, we use a spectrum:

Level one: Scripted automation: deterministic, rule-based execution. The logic is fully predetermined. There’s no reasoning, tool selection, or recovery from ambiguity. A traditional workflow engine or RPA tool is often the right choice here, not an LLM.
Level two: LLM-augmented automation: a deterministic workflow with an LLM inserted at specific steps for natural language input and output or classification. Useful and cost-effective when scoped correctly. Still not an agent.
Level three: Reactive agents: systems that respond to variable inputs by selecting from a defined set of tools, handling ambiguity, and taking conditional paths. These have genuine agentic characteristics and reason within a bounded context.
Level four: Autonomous agents: systems that plan multi-step tasks, dynamically compose tool calls, maintain context across steps, recover from failures, and know when to escalate to a human. These are true agents. They’re also the hardest to build well, the most expensive to operate, and the most consequential when they make mistakes.

Most of what enterprises are calling “agents” today sit at levels one and two. That’s not a failure; these are legitimate use cases. The problem is when they are architected and priced as if they are at levels three and four, consuming LLM tokens for tasks that a SQL query or a Python function could handle for a fraction of the cost.

Why this happens and why it’s predictable

The organizational dynamics driving the 9,000-agent scenario are understandable. Low-code agent builders hide implementation complexity, which dramatically lowers the barrier to entry. Business teams can build what they call agents without engaging IT. Velocity is rewarded. Executive dashboards show deployment counts climbing.

But abstraction has a cost. When implementation details are hidden, architectural questions are hidden, too. Nobody asks whether the task requires reasoning or just execution, because the tool makes them look the same.

There’s also an organizational incentive problem. Teams that build agents are measured on deployment velocity and coverage, not on cost efficiency or architectural soundness. The team that pays the cloud bill (often a centralized IT or FinOps function) frequently has no visibility into what each agent does or why it needs an LLM at its core.

When the tool makes everything look like an agent problem, everything becomes an agent problem. Including things that should have been a database query.

This isn’t unique to AI. It mirrors what happened with microservices, with serverless functions, and before that with object-oriented design. Every powerful pattern, when made easy to apply, gets over-applied. The discipline comes later, usually after the cost surprises.

What a real enterprise agent looks like

At Oracle, we work with enterprises across financial services, healthcare, and other regulated industries that can’t afford to be imprecise about what their agents are doing. From that work, a clearer picture emerges of what genuine agentic capability looks like in production:

It handles ambiguous input without breaking. A real agent can receive an unstructured request like “find me everything relevant to the Q3 audit exception” and dynamically determine which data sources to query, which tools to invoke, and in what order, without a human scripting each step.
It maintains context across a multi-step workflow. Not just conversation memory, but task state: what has been done, what failed, what needs to be retried, and what the current best hypothesis is about the user’s intent.
It knows its own limits. Perhaps the most underappreciated characteristic of a mature agent is the ability to recognize when it cannot proceed reliably and escalate to a human with a clear, actionable summary of where it got stuck. In regulated environments, this is not optional. It is a compliance requirement.
It’s observable. Every action the agent takes, tool it calls, and decision branch it follows is logged in a form that a compliance team or auditor can review. A black box that produces good outputs isn’t enterprise-grade. The process must be auditable, not just the result.

The’s a significant gap between an LLM call wrapped in a workflow and this kind of production-grade agentic system. It requires investment in agent memory architecture, tool registry design, failure and recovery handling, human-in-the-loop integration, and observability infrastructure. These aren’t afterthoughts. They’re the foundation.

The question every enterprise architect should ask

Before you build the next agent, or the next 9,000, there’s one question worth asking:

Does this task require reasoning or just execution?

If the logic is deterministic, if a human could write down every decision rule in advance, then a rule-based system, a workflow engine, or a fine-tuned classifier is almost certainly the right tool. It will be faster, cheaper, and more predictable than an LLM-based agent.

If the task is genuinely ambiguous, equires interpreting unstructured inputs, dynamically selects from a broad tool set, recovering from unexpected states, or operating across contexts that cannot be fully anticipated, then a real agent architecture is warranted.

The answer to that question should drive the architecture, not the capabilities of the platform that happens to be in front of you.

What the best enterprise AI teams are getting right

The leaders I spoke with who seemed clearest about their direction shared a few common characteristics:

They defined an internal taxonomy. Not just “agent” versus “automation,” but a practical framework their architects and business teams use to classify what they are building, with cost and governance implications attached to each tier.
They separate deployment velocity from architectural discipline. Business teams can move quickly in a sandbox environment. Production deployment goes through an architecture review that asks the hard questions about necessity, cost, observability, and failure handling.
They’re investing in the data foundation before the agent layer. The most common failure mode they described, and the one I see most often in platform work, is agents that are architecturally sound, but data-impoverished: they cannot do meaningful work because the underlying data is inconsistent, siloed, or inaccessible. The data platform isn’t a dependency of the agent strategy. It’s the prerequisite.
They treat human-in-the-loop as a feature, not a limitation. In regulated environments, the agent’s ability to escalate cleanly to a human—and for that human to review, correct, and resume the workflow—is a compliance capability. They understand the design for it from the beginning rather than bolting it on later.

Quality of agency over quantity of agents

The 9,000-agent story is becoming more common, not less. The tools keep getting easier, deployment velocity keeps climbing, and agent count dashboards keep looking more impressive.

The enterprises that navigate this well will be the ones that resist the temptation to measure progress in agent counts. They’ll ask harder questions about what each agent is actually doing, what it’s costing, and whether it’s delivering value that a simpler system could not.

There’s a real and profound transformation underway in how enterprises use AI to automate and augment knowledge work. Agentic systems, built correctly on the right data foundations, with appropriate observability and human oversight, are going to change what’s possible for large organizations.

This transformation will be led by the organizations that build fewer agents that do more —not by those with the largest deployment count.

For more information:

Agents vs. Workflows: Where Does the ROI Actually Live?

The impressive number to worry about

Defining the spectrum: from automation to agency

Why this happens and why it’s predictable

What a real enterprise agent looks like

The question every enterprise architect should ask

What the best enterprise AI teams are getting right

Quality of agency over quantity of agents

Guy Michaeli

Senior Director of Product Management

Oracle AI Data Platform Workbench – Drive AI-Powered Workflows and Insights Across Enterprise and Oracle Fusion Cloud ERP Data

Agents vs. Workflows: Where Does the ROI Actually Live?

The impressive number to worry about

Defining the spectrum: from automation to agency

Why this happens and why it’s predictable

What a real enterprise agent looks like

The question every enterprise architect should ask

What the best enterprise AI teams are getting right

Quality of agency over quantity of agents

Authors

Guy Michaeli

Senior Director of Product Management

Oracle AI Data Platform Workbench – Drive AI-Powered Workflows and Insights Across Enterprise and Oracle Fusion Cloud ERP Data