From AI Hype to Enterprise Reality: The Dilemma Facing Today’s Technology Leaders

Over the last few months, I’ve had dozens of conversations with CIOs, CTOs, and CDOs across banking, government, telecom, healthcare, and other large enterprises.

Despite all the attention around generative AI and agentic AI, the sentiment among technology leaders is still surprisingly divided.

I generally see two very different camps.

One group, especially in highly regulated industries, remains deeply skeptical. They are cautious for good reason. Concerns around security, governance, compliance, and trust are still very real.

The other group wants to move faster, but they are overwhelmed. The pace of innovation is relentless. New models, tools, frameworks, and platforms keep appearing almost every week. For many leaders, the challenge is not whether AI matters. It is deciding where to begin, what to prioritize, and how to move without creating unnecessary risk or wasted investment.

Both perspectives are understandable. But they highlight the same underlying issue.

The hype around AI has not translated easily into enterprise adoption.

The missing piece for many organizations is a clear roadmap for how AI adoption should progress inside an enterprise environment.

The Skepticism: “Is Agentic AI Even Real?”

Many CTOs in highly regulated industries are questioning whether agentic AI is practical today.

Their concerns are legitimate. They see:

  • AI systems behaving like black boxes
  • Security risks in autonomous agents
  • Lack of explainability
  • Unclear governance models

For organizations operating under strict regulatory frameworks, this raises a simple but critical question: Is this technology mature enough to trust with critical business processes? Because of these risks, many leaders are not pursuing incremental improvements.
Instead, they are waiting for a large breakthrough use case that justifies the risk of adoption.

But this expectation can delay meaningful progress.

The Reality: Agentic AI Does Not Mean Full Autonomy

One of the biggest misconceptions about agentic AI is that it must behave like a fully autonomous system, similar to self-driving cars.

We are not there yet.

And more importantly:

Enterprise AI does not require full autonomy to create business value.

In practice, the most successful implementations today operate at different levels of autonomy.

Narrow-Scope Agents

These agents operate with:

  • defined tools
  • limited decision boundaries
  • structured workflows

Architecturally, they behave more like intelligent backend services.

This approach provides something enterprises care deeply about: behavioural consistency and predictability.

In many organizations, these types of agents already deliver meaningful benefits in areas such as:

  • workflow automation
  • engineering productivity
  • operational support

More Autonomous Agents

In low-risk domains such as knowledge management or internal assistance, agents can operate with greater autonomy.

Examples include:

  • research assistants
  • internal knowledge agents
  • productivity assistants

These systems tolerate more variability because the risk profile is lower.

One reason agentic AI has been over-hyped is that some vendors promote the idea of a single universal agent platform doing everything.

In reality, enterprise AI architectures will likely consist of multiple agents with varying levels of autonomy, each designed for specific use cases.

The Other Dilemma: “We Want to Start, But the Investment Looks Massive”

The second group of leaders I meet are enthusiastic about AI but overwhelmed by the perceived cost.

They see rapid advances in:

  • models
  • infrastructure
  • frameworks
  • tooling

And they worry that by the time they make a large investment, the technology may already be obsolete.

This fear often leads to analysis paralysis.

But the reality is much simpler.

You do not need massive upfront investment to begin the enterprise AI journey.

In fact, large “big bang” AI initiatives often fail.

The more practical approach is straightforward:

  1. Identify a real business problem
  2. Run a targeted experiment
  3. Deploy the solution in production
  4. Expand once the value is proven

AI adoption works best when it follows an iterative maturity journey, not a single transformation program.

Why Early Copilot Promises Didn’t Always Deliver

Another frustration I frequently hear relates to the early wave of AI copilots.

Many organizations expected dramatic productivity gains.

But the outcomes were mixed.

That is because most copilots focused on individual productivity, such as:

  • email summarization
  • document drafting
  • search assistance

While useful, these improvements do not necessarily translate into enterprise-level ROI.

The deeper productivity gains come from something else entirely:

the readiness of enterprise systems behind the AI.

Enterprise-level AI productivity requires:

  • integrated enterprise data
  • modern application architectures
  • strong security models
  • governance frameworks

What works for general productivity use cases does not automatically translate to enterprise environments.

In many cases, organizations rushed to become “GenAI ready” without first ensuring their enterprise platforms were AI ready.

.

A Practical Approach: Start Small, Scale Intelligently

The organizations that succeed with AI are not the ones chasing every new breakthrough.

They are the ones who:

  • start with real business problems
  • validate outcomes through experimentation
  • deploy incrementally
  • scale once value is proven

AI adoption is less about technology breakthroughs and more about organizational readiness and disciplined execution.

How IBM Can Help

At IBM, our Client Engineering teams work closely with organizations to navigate this journey.

Rather than starting with technology, we begin with business outcomes and real use cases aligned to your Business transformation Journey.

The Key Question for Every CIO and CDO

Every enterprise today is at a different stage of its AI adoption journey.

The real question is not whether AI will transform your organization.

It will.

The more important question is where you are today — and what capability you need to build next to move forward with confidence.

My take is that , the organizations that get that right will move beyond the hype and start realizing real value from AI.

Feel free to reach out if you would like to continue the discussion.

I Teamed Up With 3 Coding Agents for a Hackathon

Notes from a real parallel-coding experiment with Codex, BoB, and Claude
Field notes from running AI coding agents in parallel.


Using a single AI coding agent feels productive — until the code grows and the workflow becomes serial: prompt → wait → review → fix → wait again.

In real teams, we parallelize work. One person builds, another reviews, another tests. I wanted to see if the same idea worked with AI. So I tried running multiple coding agents in parallel to see what changed — speed, failure modes, and the kind of coordination it would require.

The setup

I worked with three AI coding agents, each with a loosely defined role:

  • IBM Bob – vibe-coded the initial release and base structure
  • OpenAI Codex – implemented new functional features in parallel
  • Anthropic Claude – design review, validation, and functional reasoning

There was no automated orchestration. I manually coordinated tasks, shared context, merged changes, and resolved conflicts. Think of it as a small hackathon where the “developers” never sleep—but also never fully talk to each other.


First impression: parallel agents do feel faster

Early on, the speedup was obvious.

While Bob pushed the base forward, Codex worked on features, and Claude reviewed flows and edge cases. Progress felt continuous rather than blocked. There was always something moving.

But that speed introduced a new class of problems.


Behavior #1: Agents don’t like fixing other agents’ bugs

This surprised me.

When one agent introduced a bug, the next agent often did not fix it, even when explicitly asked. Instead, it would:

  • partially work around the issue,
  • refactor an adjacent area,
  • or introduce a new abstraction and focus on that problem instead.

Even after multiple nudges, the tendency was to solve a new problem it created, not repair the original regression.

This felt uncomfortably familiar. Very human behavior.

Takeaway: AI agents are good at forward motion. Disciplined repair requires explicit constraints.


Behavior #2: “Works in dev mode” exists for agents too

Each agent claimed to test locally.

In practice, each had implicitly assumed:

  • different execution paths,
  • different entry conditions,
  • different interpretations of “done.”

You can get each agent to a working state without breaking the full codebase—but only if you force them to test against the same integration points.

Takeaway: parallel agents multiply “it works on my machine” scenarios.


Behavior #3: Checkpoints are not optional

Agents don’t naturally leave a clean trail.

If I didn’t explicitly ask for:

  • a summary of changes,
  • files touched,
  • what was intentionally not changed,
  • how to test,

…reconstructing state later was painful.

I ended up enforcing a simple rule: every agent must produce a checkpoint before handing work back.

This wasn’t about memory—it was about engineering hygiene.

Takeaway: with multiple agents, your role shifts from coder to release manager.


Behavior #4: Agents get “tired” (or behave like it)

Over longer sessions, I noticed a pattern:

  • “Bug fixed.” (It wasn’t.)
  • “All tests passing.” (No tests were added.)
  • “Issue resolved.” (The behavior still reproduced.)

This felt like agent fatigue—not literal exhaustion, but premature convergence on “done” instead of last-mile verification.

Resetting context or switching agents often helped.

Takeaway: long-running sessions degrade. Fresh context restores accuracy.


Context windows become the real bottleneck

As the codebase grew, productivity dropped—not because the agents got worse, but because context management became harder.

What helped consistently:

  • modular code from the start,
  • a clear design/spec file,
  • explicit module boundaries,
  • small, testable units.

Once agents could work against well-defined modules instead of “the whole repo,” efficiency recovered.

Takeaway: architecture matters more with AI agents, not less.


Cost reality: Codex is powerful—and expensive

Even on a small project, Codex can burn money quickly, especially when paired with browser-based testing.

Left unchecked, it’s easy to spend ~$10/hour chasing the same mistake across multiple iterations. It’s capable, but it doesn’t always take the shortest correction path.

Takeaway: treat high-end coding agents like scarce compute, not infinite labor. IBM Bob did better on cost probably it distribute workloads on multiple model inherently.



In practice, shaping how the work is structured mattered just as much as which agents I used.

So, did this actually work?

Yes—but not magically.

Parallel agents increase throughput, but they also increase:

  • coordination overhead,
  • integration risk,
  • and the need for explicit engineering discipline.

You don’t escape software engineering.
You move up a level.

Instead of writing every line, you define rails, enforce checkpoints, validate truth, and merge reality. As we say you need to be disciplined software engineer to leverage these tools at the moment. If you lack discipline you may be sitting on garbage codes.

AI coding are great amplifier, teamed up with right engineer.


What’s next

This was a small experiment. We’ve since run a team hackathon with ~200 engineers applying similar AI-assisted coding patterns at scale.

I’ll share our collective observations next.

One thing is already clear: learning to code with AI agents—not just using them—is becoming a core programming skill.