I Teamed Up With 3 Coding Agents for a Hackathon

Notes from a real parallel-coding experiment with Codex, BoB, and Claude
Field notes from running AI coding agents in parallel.


Using a single AI coding agent feels productive — until the code grows and the workflow becomes serial: prompt → wait → review → fix → wait again.

In real teams, we parallelize work. One person builds, another reviews, another tests. I wanted to see if the same idea worked with AI. So I tried running multiple coding agents in parallel to see what changed — speed, failure modes, and the kind of coordination it would require.

The setup

I worked with three AI coding agents, each with a loosely defined role:

  • IBM Bob – vibe-coded the initial release and base structure
  • OpenAI Codex – implemented new functional features in parallel
  • Anthropic Claude – design review, validation, and functional reasoning

There was no automated orchestration. I manually coordinated tasks, shared context, merged changes, and resolved conflicts. Think of it as a small hackathon where the “developers” never sleep—but also never fully talk to each other.


First impression: parallel agents do feel faster

Early on, the speedup was obvious.

While Bob pushed the base forward, Codex worked on features, and Claude reviewed flows and edge cases. Progress felt continuous rather than blocked. There was always something moving.

But that speed introduced a new class of problems.


Behavior #1: Agents don’t like fixing other agents’ bugs

This surprised me.

When one agent introduced a bug, the next agent often did not fix it, even when explicitly asked. Instead, it would:

  • partially work around the issue,
  • refactor an adjacent area,
  • or introduce a new abstraction and focus on that problem instead.

Even after multiple nudges, the tendency was to solve a new problem it created, not repair the original regression.

This felt uncomfortably familiar. Very human behavior.

Takeaway: AI agents are good at forward motion. Disciplined repair requires explicit constraints.


Behavior #2: “Works in dev mode” exists for agents too

Each agent claimed to test locally.

In practice, each had implicitly assumed:

  • different execution paths,
  • different entry conditions,
  • different interpretations of “done.”

You can get each agent to a working state without breaking the full codebase—but only if you force them to test against the same integration points.

Takeaway: parallel agents multiply “it works on my machine” scenarios.


Behavior #3: Checkpoints are not optional

Agents don’t naturally leave a clean trail.

If I didn’t explicitly ask for:

  • a summary of changes,
  • files touched,
  • what was intentionally not changed,
  • how to test,

…reconstructing state later was painful.

I ended up enforcing a simple rule: every agent must produce a checkpoint before handing work back.

This wasn’t about memory—it was about engineering hygiene.

Takeaway: with multiple agents, your role shifts from coder to release manager.


Behavior #4: Agents get “tired” (or behave like it)

Over longer sessions, I noticed a pattern:

  • “Bug fixed.” (It wasn’t.)
  • “All tests passing.” (No tests were added.)
  • “Issue resolved.” (The behavior still reproduced.)

This felt like agent fatigue—not literal exhaustion, but premature convergence on “done” instead of last-mile verification.

Resetting context or switching agents often helped.

Takeaway: long-running sessions degrade. Fresh context restores accuracy.


Context windows become the real bottleneck

As the codebase grew, productivity dropped—not because the agents got worse, but because context management became harder.

What helped consistently:

  • modular code from the start,
  • a clear design/spec file,
  • explicit module boundaries,
  • small, testable units.

Once agents could work against well-defined modules instead of “the whole repo,” efficiency recovered.

Takeaway: architecture matters more with AI agents, not less.


Cost reality: Codex is powerful—and expensive

Even on a small project, Codex can burn money quickly, especially when paired with browser-based testing.

Left unchecked, it’s easy to spend ~$10/hour chasing the same mistake across multiple iterations. It’s capable, but it doesn’t always take the shortest correction path.

Takeaway: treat high-end coding agents like scarce compute, not infinite labor. IBM Bob did better on cost probably it distribute workloads on multiple model inherently.



In practice, shaping how the work is structured mattered just as much as which agents I used.

So, did this actually work?

Yes—but not magically.

Parallel agents increase throughput, but they also increase:

  • coordination overhead,
  • integration risk,
  • and the need for explicit engineering discipline.

You don’t escape software engineering.
You move up a level.

Instead of writing every line, you define rails, enforce checkpoints, validate truth, and merge reality. As we say you need to be disciplined software engineer to leverage these tools at the moment. If you lack discipline you may be sitting on garbage codes.

AI coding are great amplifier, teamed up with right engineer.


What’s next

This was a small experiment. We’ve since run a team hackathon with ~200 engineers applying similar AI-assisted coding patterns at scale.

I’ll share our collective observations next.

One thing is already clear: learning to code with AI agents—not just using them—is becoming a core programming skill.

Agentic AI in APAC: Navigating the Path from Pilot to Production

Agentic AI in APAC: Navigating the Path from Pilot to Production

This blog post is a follow-up to what I shared at our recent meetup organized by AI Verify, where over 100 members of our community joined us at IMDA to share and learn real-world stories about making Agentic AI reliable.

The Asia-Pacific region is witnessing a significant shift in how organizations approach artificial intelligence, moving beyond traditional AI implementations toward more autonomous, agentic systems. Based on our recent pilot programs with enterprise customers across APAC, we’re seeing distinct patterns emerge in adoption strategies, use cases, and implementation challenges.

Current Adoption Patterns: What Our Pilot Data Reveals

Our customer pilot programs have provided valuable insights into how organizations are actually utilizing agentic AI capabilities. The data reveals interesting trends in feature adoption:

Tool Usage leads the way at 45% adoption, indicating that organizations are primarily leveraging AI agents’ ability to interact with existing software tools and APIs. This suggests a pragmatic approach where companies are extending their current technology stack rather than replacing it entirely.

Multi-Agent Systems follow at a notable 70% adoption rate, demonstrating strong interest in deploying multiple specialized agents that can collaborate on complex tasks. This high adoption rate indicates that APAC organizations recognize the value of distributed AI capabilities.

Reflection capabilities show 15% adoption, suggesting that while organizations value AI systems that can self-evaluate and improve their responses, this remains a more advanced feature that requires additional organizational maturity.

Action-oriented implementations currently represent 5% of adoption, indicating that while there’s interest in AI systems that can take direct actions, most organizations are still in the monitoring and recommendation phase. This low adoption rate reflects the preference for human-in-the-loop approaches, where AI agents recommend actions but require human approval before execution, ensuring oversight and control over critical business decisions.

Top 5 Use Cases Driving APAC Adoption

1. Software Development Lifecycle (SDLC)

Organizations are implementing agentic AI to automate code review processes, generate test cases, and assist in deployment pipelines. The ability of AI agents to understand context across multiple development phases makes them particularly valuable for streamlining software delivery.

2. Deep Research and Analysis

Companies are deploying AI agents to conduct comprehensive market research, competitive analysis, and regulatory compliance reviews. These agents can process vast amounts of unstructured data and synthesize findings across multiple sources and languages—particularly valuable in APAC’s diverse regulatory landscape. For example, financial institutions are using AI agents to research source of wealth documentation and process commercial loan company profiles, automatically gathering and analyzing corporate filings, news articles, and regulatory records to build comprehensive risk assessments.

3. Manufacturing Process Automation

Manufacturing companies are using agentic AI to optimize production schedules, predict maintenance needs, and coordinate supply chain activities. AI agents can adapt to changing production requirements and coordinate across multiple systems in real-time. A notable application is in new product design research, where AI agents analyze market trends, competitor products, regulatory requirements, and technical specifications to provide comprehensive insights that inform product development decisions and accelerate time-to-market.

4. Sales Insights and Customer Experience

Organizations are implementing AI agents to analyze customer interactions, predict purchase behavior, and personalize engagement strategies. These systems can process customer data across multiple touchpoints and provide actionable insights for sales teams.

5. Procurement Process Automation

Companies are streamlining procurement workflows using AI agents that can evaluate suppliers, negotiate contracts, and manage purchase orders. These agents can adapt to changing market conditions and organizational requirements while maintaining compliance standards.

Three Distinct Adopter Profiles

Our experience across APAC markets has revealed three primary adoption patterns:

Early Adopters: The “Agentic” Pioneers

These organizations are enthusiastic about becoming “agentic” and focus on automating existing workflows. They’re willing to experiment with newer technologies and often serve as proof-of-concept environments for more advanced AI capabilities. Early adopters typically have strong technical teams and leadership buy-in for AI initiatives.

Stack Builders: Long-term Strategic Planners

Stack Builders approach agentic AI with enterprise-wide adoption in mind. They start with simple, well-defined use cases while building the infrastructure and organizational capabilities needed for broader deployment. These organizations prioritize scalability and integration with existing enterprise systems.

Pragmatic Adopters: Embedded Solution Seekers

Pragmatic adopters prefer implementing agentic AI through embedded applications in platforms they already use, such as Salesforce or Microsoft 365. They focus on immediate business value and prefer solutions that require minimal change to existing processes and user behavior.

Key Implementation Challenges

Despite growing interest, organizations face several significant hurdles in scaling agentic AI implementations:

Business Readiness for Dynamic Workflows

Traditional business processes are designed for predictability and control. Agentic AI introduces dynamic decision-making that can feel unpredictable to stakeholders. Organizations struggle with the cultural shift required to trust AI agents with important business decisions, particularly in risk-averse cultures common across many APAC markets.

Quantification of Business Outcomes

Measuring the ROI of agentic AI implementations remains challenging. Unlike traditional automation projects with clear metrics, agentic systems often provide value through improved decision quality, faster response times, and enhanced adaptability—benefits that are difficult to quantify using conventional business metrics.

Access to Source Systems

Many organizations have data and systems scattered across multiple platforms, often with limited API access or integration capabilities. Agentic AI requires comprehensive data access to function effectively, but legacy systems and data silos create significant technical barriers to implementation.

Cost of Manual Data for Evaluation

Evaluating agentic AI performance requires significant manual effort to create test datasets, validate outputs, and assess decision quality. Organizations underestimate the ongoing cost of maintaining evaluation frameworks, particularly when AI agents are deployed across multiple use cases with different success criteria.

Looking Forward: The Path to Maturity

The APAC market’s approach to agentic AI reflects broader regional characteristics: thoughtful adoption, emphasis on practical business outcomes, and careful risk management. Organizations that succeed in scaling agentic AI implementations will likely be those that address the fundamental challenges of trust, measurement, and integration while building organizational capabilities for managing dynamic AI systems.

Two critical factors will determine scalability success: agent observability and cost optimization. Agent observability—the ability to monitor, debug, and understand AI agent decision-making processes in real-time—is essential for building organizational trust and ensuring reliable performance at scale. Without clear visibility into how agents make decisions, organizations struggle to troubleshoot issues, optimize performance, and maintain compliance standards.

Equally important is managing the cost of the solution, which becomes a key barrier to scale. While pilot programs may absorb higher per-token costs, enterprise-wide deployment requires sustainable economic models. Organizations need to factor in not just the direct costs of AI infrastructure, but also the ongoing expenses of monitoring, evaluation, human oversight, and system integration.

As the technology matures and more organizations share their implementation experiences, we expect to see standardized evaluation frameworks, improved integration capabilities, and greater organizational comfort with AI-driven decision-making. The current pilot phase is laying the groundwork for more widespread adoption across the region.

Explain how AI recommendations are being made to end users.

screenshot2019-02-01at9.21.44am

Explaining “how and why” behavior of any product & activity has always been very crucial. In the digital era, it has just become more prominence.
An example such as facebook inability to explain “how and why” of data sharing led to a big trust deficit with its users. “Explainability ” Revolution has started a while ago as evidence from the huge popularity of Jupyter/Zeppelin notebooks, data lineage in reporting, data governance project in enterprise & roles such as chief data officer.
The revolution is now pacing up as the adoption of Machine Learning and AI goes mainstream. With open source ML library and tons of code available online, an immature and a professional both can create a model that can be as critical as predicting your illness. How do we differentiate and trust this model and results?
Consider for example Healthcare recommendation engine on https://www.healthcare.com/.

screenshot2019-02-01at8.49.24am

By providing some basic inputs such as age, location it recommends healthcare plan personalized for you. As there is no explanation, top three recommendation coming for the same provider raises doubt and questions. Is the recommendation engine or company bias toward a specific provider? What were the criteria to recommend?

A black-box approach toward AI would be insensitive to the consumer and create lack of trust and will defeat the very purpose of leveraging AI to accelerate and improve customer experience.

“Explainability ” is the next big thing.
Visit https://www.ibm.com/cloud/ai-openscale to experience what it takes to provide explainability to your recommendation.

Say hello to “Lisa” the most impressive customer care officer. None of AI can compete.

ai_

I recently called my bank and went through ten minutes of waiting on the phone, punching multiple keys, iterative menu and dialing again. I was stupid enough to identify in which category my request should be placed, and customer care AI system was efficient enough to disconnect a client not fully oriented with the bank voice menu. After few, attempt the AI-powered system gave on my intelligence and connected to Lisa.

What a relief, a truly advanced system that can understand emotions, answer my open-ended questions, have no issue with my lingo and finished my request faster and recommended a new product which I gladly accepted as it was fun to interact. Truly impressive customer care. Lisa is not the next generation of humanoid but human itself. Say hello to Lisa the most exceptional customer care officer.

In a genuinely democratic world where few vocals are re-writing the history and being neutral is a Sin as evident from recent US election ( http://brilliantmaps.com/did-not-vote/  ).  I want to ensure I am doing my job to set the right priority for myself and the fellow professionals, being myself an ML, AI, and big data evangelist. I see a lot of conferences where professionals and startup pride in replacing normal human interaction with a robot or AI-powered system. While this may sound cool, it certainly doesn’t make a business sense. Consider a BankA which replaces human interaction with the NLP-based engine to respond to customers may be saving millions of dollars.  This saving is diverted toward improving brand recognition, loyalty and reach out to potential buyers.  Now imagine another BankB which employs a mass, each employee brings new customers consistently & effortlessly due to their network and relationship. The enhanced customer experience becomes a “Brand” for itself, and the customer remains loyal irrespective of promotional offering from XYZ banks.  A Happy employee, happy clients, makes the world a better place to live.

There are problems that human race has been struggling since generations such as poverty, food crisis, natural disaster, drinking water availability, healthcare, education.  And we have new ones such as cyber security, abuse of social media, fraud and terrorism, efficient transportation in rural areas and a lot of “big Questions” that can be answered by Big data.  As a professional I prioritize and support projects such as

As IBM Watson Machine Learning, Microsoft Azure ML and Amazon ML aim to simplify ML and empower more professional it’s time to emphasize on the first and the most important phase, and that’s the phase where you ask the question and you specify what is it that you’re interested in learning from data. “what question we are asking “.