Tech News

AI Agents Are Breaking the Foundations of Software Engineering

As enterprises accelerate their adoption of artificial intelligence, many believe they are already operating at the frontier of innovation. But according to Brian Peret, Director of CodeBoxx Academy, that perception is misleading.

byHugh Grant

9 March 2026

What most organizations are deploying today—chatbots, copilots, and scripted automations—falls far short of what defines true AI autonomy. And as companies begin experimenting with agentic systems, a deeper issue is emerging: the foundations of modern software engineering were never designed to support them.

“It is understandable to confuse a generative model paired with automation scripts for a true agent, but they are fundamentally different beasts,” Peret explains.

That distinction is more than technical—it marks the beginning of a structural shift.

From Deterministic Code to Autonomous Behavior

Traditional software operates within a predictable framework: engineers write code, define rules, and validate outputs. Agentic systems, however, introduce a fundamentally different operating model.

“An agentic system operates through a continuous loop of reasoning, acting, and observing,” says Peret. “Instead of following a rigid script, you provide it with a destination and let it determine the best path forward.”

This shift redefines the role of software entirely. Instead of executing predefined instructions, systems are now making decisions in real time.

“Autonomy meaningfully begins at the stage of planning and reflection,” he adds. “This is where the AI stops being a tool that responds to a prompt and starts being a partner that manages a process.”

In practical terms, the transition to agentic AI is not about automating tasks—it is about delegating responsibility.

The Collapse of Traditional Testing

As autonomy increases, long-standing development practices are starting to break down.

“Existing DevOps and CI/CD pipelines are built on the foundational assumption that code is deterministic and predictable,” Peret says.

That assumption no longer holds in systems that adapt dynamically to context and evolve their behavior over time.

“We are moving from a world of testing code to a world of monitoring behavior.”

In traditional environments, testing validates whether a system produces the expected output. But with agentic AI, outcomes are not always predefined—and may vary depending on how the system interprets its objective.

This creates a new kind of challenge: systems that are technically functional but strategically misaligned. “An agent might be perfectly functional from a technical standpoint while simultaneously making choices that deviate from business objectives or ethical guardrails.”

When “Bugs” Are No Longer Bugs

One of the most significant implications of this shift is the redefinition of failure.

“In a traditional environment, a bug is a mistake in logic that can be traced and fixed,” Peret explains. “In agentic environments, a bug may be based on sound logic, yet is in misalignment with human values or business goals.”

This introduces a new category of risk—one that is not rooted in broken code, but in misaligned reasoning.

This challenge aligns with broader industry concerns around AI governance, as outlined in the NIST AI Risk Management Framework, which highlights the difficulty of ensuring AI systems operate in accordance with human values and organizational intent.

In this context, correctness is no longer binary. A system can be logically consistent and still produce outcomes that are undesirable, unpredictable, or even harmful.

Emergent Risk and System Instability

As organizations move toward multi-agent environments, complexity increases exponentially.

“Because agents are designed to be creative and resourceful in achieving their goals, they may find shortcuts that technically satisfy their instructions while violating unstated ethical or security boundaries,” Peret says.

Even more concerning is the interaction between agents operating with different objectives.

“When multiple agents… create feedback loops that spiral out of control in milliseconds.”

These cascading effects introduce a level of systemic risk that traditional monitoring and intervention models are not equipped to handle. In many cases, failures can unfold faster than humans can detect—let alone respond to—them.

The Infrastructure Problem

Despite these challenges, most enterprises are attempting to scale agentic AI on top of legacy systems.

“Ultimately, most existing enterprise infrastructures are not yet ready for the leap to true agentic autonomy,” Peret says. “We are still using 20th-century tools to manage 21st-century intelligence.”

One of the most critical gaps is observability. “We need a semantic observability layer that records the agent’s reasoning steps alongside its technical actions,” he explains.

Without visibility into how decisions are made, organizations face significant limitations in auditing behavior, debugging failures, and meeting compliance requirements.

The Dangerous Gap Between Perception and Reality

Perhaps the most pressing issue is not technological, but perceptual.

“Most companies are overestimating their readiness by confusing a powerful chatbot with a truly autonomous system,” Peret says.

In practice, many organizations are layering advanced AI capabilities onto infrastructures that were never designed to support them. “It is the tech equivalent of putting a jet engine on a horse-drawn carriage and expecting it to fly.”

Closing this gap will require more than incremental upgrades. It demands a fundamental rethinking of how software is built, tested, and governed.

Because as AI systems move from assisting work to executing it, the real challenge is no longer adoption. It is whether enterprises are prepared to manage intelligence that can act on its own.