Loop Engineering: The 2026 Shift From Prompt Engineering

On June 8, 2026, a developer named Peter Steinberger posted one sentence on X which basically meant: stop prompting your coding agents, start designing the loops that prompt them. It crossed 6.5 million views in days. Within a week, Boris Cherny, who runs Claude Code at Anthropic, said the same thing in his own words: he doesn’t prompt Claude anymore, he writes loops that prompt Claude for him.

Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore.

You should be designing loops that prompt your agents.
— Peter Steinberger 🦞 (@steipete) June 7, 2026

That’s loop engineering, and it’s a real shift, not a rebrand. If you’ve spent the last two years getting good at prompt engineering, the skill that mattered is changing under you. This article breaks down what loop engineering actually means, how it works mechanically inside tools like Claude Code, where the idea came from, and where it falls apart when people use it badly.

Table of Contents

What is Loop Engineering?

Loop engineering is the practice of designing the system that prompts an AI agent, instead of typing each prompt yourself. You define a goal, give the agent a way to find work, act on it, verify the result, and decide what happens next, then let that system run the agent on its own.

Strip away the jargon and a loop is four moves on repeat: discover, plan, execute, verify, repeat until a condition is met. For two years, you were the loop. You sat between the agent’s steps, read the output, caught the mistake, and typed what came next. Loop engineering moves that job off you and onto a system you built.

The term has three names attached to it. Peter Steinberger, who built the OpenClaw agent framework, made the case that prompting is no longer the leverage point. Boris Cherny backed it from inside Anthropic, where his team builds Claude Code. And Addy Osmani, an engineering lead at Google, wrote the essay that gave the practice its name and structure days later. None of them invented the underlying pattern. What changed is that it finally got a name precise enough to organize a whole conversation around.

Loop engineering is the practice of designing a system, not a single prompt, that triggers an AI agent, lets it act, verifies the result against a defined condition, and repeats until that condition holds. The term was coined in June 2026 by Peter Steinberger and popularized in an essay by Addy Osmani, building on practices already running inside Anthropic’s Claude Code team under Boris Cherny.

It’s worth being honest about scope here. This is, right now, almost entirely a software engineering concept. The examples that work cleanly all come from coding: migrating an API, fixing a test suite, clearing a backlog. Marketers and operators are starting to borrow pieces of it, mostly around things like SEO research loops or QA passes on marketing pages, but there’s no mature playbook yet for non-technical work. If you’re not writing code day to day, treat this as a look at where the leading edge of AI tooling is heading, not a workflow you can lift wholesale tomorrow.

How is Loop Engineering Different From Prompt Engineering?

A good prompt fixes one turn. It says nothing about what happens next, what counts as finished, or how the agent should recover if it gets something wrong. That’s the wall every team running agents in production eventually hits.

Prompt engineering optimizes the input you hand the model, one turn at a time. Loop engineering optimizes the system around the model: the trigger that starts it, the tools it can reach, the context it sees, the check that decides if it’s actually done, and the rule for when a human needs to step in. One shapes a sentence. The other shapes a process.

Take a task like migrating 400 call sites to a new API. You can’t script that with find-and-replace, because each call site needs slightly different judgment. But it’s a near-perfect fit for a loop, because the end state is mechanically checkable: the code compiles, and the tests pass. Each individual site needs intelligence to fix. The outcome needs none to verify. That’s the whole shift in one example: the contract moved from “define every step” to “define the finish line.”

This is also why loop engineering and prompt engineering aren’t really competitors. A loop still runs on prompts internally, the agent gets prompted every single turn, just not by you typing it live. Prompt engineering becomes table stakes, a skill you need to write a decent condition and a decent system prompt. Loop engineering is the layer built on top of it.

Where Did Loop Engineering Actually Come From?

Loop engineering didn’t appear out of nowhere in June 2026. It’s the fourth step in a lineage that goes back to 2022, and knowing the lineage is what separates people who can actually use this from people repeating a buzzword.

It started with ReAct, a 2022 research paper from Yao and colleagues that formalized a simple pattern: the model reasons, calls a tool, reads the result, and repeats until it’s done. One model, one loop, a human watching the whole time.

AutoGPT picked that pattern up in 2023 and tried to remove the human. You gave it a goal and let it prompt itself. It became famous mostly for spinning forever and accomplishing nothing, which set back agent credibility for years. The idea was right. The infrastructure to verify progress wasn’t there yet.

Then in July 2025, a developer named Geoffrey Huntley published something called the Ralph technique, named after Ralph Wiggum from The Simpsons. Ralph is a coding agent inside a plain bash while-loop: feed it the same prompt against a written spec, let it pick one task, implement it, then start a completely fresh instance and feed the identical prompt again. Repeat until the work is done.

The non-obvious insight in Ralph isn’t the loop itself. It’s the context reset. A long agent session degrades as its context window fills up with old reasoning, dead ends, and stale file contents. Ralph starts clean every time and lets progress live on disk and in git, not in a growing conversation. Huntley’s own description was that it’s deterministically simple in an unpredictable world. It looks too dumb to work. It works.

The Ralph technique, published by Geoffrey Huntley in July 2025, runs a coding agent inside a plain bash loop, resetting context to a fixed set of anchor files on every iteration instead of letting a single long session degrade. It’s the direct ancestor of the native loop commands that shipped inside coding agents less than a year later.

By May 2026, the pattern moved from hand-rolled bash scripts into the products themselves. Claude Code shipped a native /goal command in version 2.1.139. OpenAI’s Codex CLI shipped its own version weeks later. That’s the line from ReAct to AutoGPT to Ralph to native tooling, four years, and it’s the reason loop engineering landed when it did. The model finally got reliable enough, the context window finally got large enough, and the cost finally came down enough to make running a loop economically sane instead of a science experiment.

How Claude Code’s /goal and /loop Commands Work

This is where loop engineering stops being a philosophy and becomes something you can actually run, so it’s worth getting the mechanics right.

/goal sets a completion condition, and Claude keeps working across turns on its own instead of returning control to you after each one. Here’s the part most explainers skip: Claude is not the one deciding whether it’s done. After every turn, the condition and the conversation so far get sent to a separate, smaller, faster model, Haiku by default, which returns a plain yes-or-no decision with a short reason. A “no” sends Claude back to work with that reason as guidance. A “yes” clears the goal.

That separation matters. If the model grading the work were the same model that did the work, you’d be asking it to mark its own homework, and capable models are good at making partial progress look like full completion. Splitting the maker from the checker is the entire point.

A goal needs three things to actually work:

A measurable end state, not a process. “Refactor the code” gives the evaluator nothing to check. “Every function has a JSDoc comment and no function exceeds 40 lines” gives it something it can verify against the conversation.
A stated check. How will Claude prove it? “npm test exits 0” or “git status is clean” are checkable. “The code is better” is not.
A turn or time limit, especially for anything unattended. The evaluator only judges what Claude has already shown in the conversation, it doesn’t independently run commands or inspect files on its own, so a poorly worded condition can spin for a very long time before anyone notices.

Here’s roughly what setting one looks like in practice:

/goal all tests in test/auth pass and the lint step is clean

/loop is a different tool solving a different problem, and the two get confused constantly. /goal pushes one piece of work to a finish line: the next turn starts the moment the last one ends, and it stops when the evaluator confirms you’re done. /loop watches for a change on a schedule: the next turn starts when time elapses, and it stops when you tell it to. One is driven by progress. The other is driven by the clock.

Claude Code’s /goal command, shipped in version 2.1.139 in May 2026, sets a verifiable completion condition and has a separate small model, Haiku by default, judge after every turn whether the condition has been met, rather than letting the agent that did the work grade itself.

The practical rule is simple. If the job has a clear finish line but you don’t know how many tries it’ll take, that’s a goal: clean every row in a list until each has a verified email, work through a backlog until the queue is empty. If you’re watching for something external to change, that’s a loop: did the overnight export land, is the build still green. Point the wrong one at the wrong job and you either burn turns re-running blindly on a clock, or spin forever waiting on something the agent has no power to move.

Also Read: How to setup Claude Cowork Workflows

Why the Verifier is the Real Bottleneck, Not the Model

Almost every explainer of loop engineering leads with the model. That’s the wrong place to look. In any loop, the verifier is the bottleneck, not the model doing the work.

The model is good at looking done. That’s not a flaw you can prompt away, it’s a structural property of how these systems generate confident-sounding output. Hand a loop ten files to refactor and walk away, and you’ll often come back to something that looks finished. Half the files got a one-line change and a comment claiming the job is handled. The hard part of the task is exactly the part the model learns to route around, because the hard part is where it might fail, and a confident summary reads like success either way.

This is why the separation between the agent that does the work and the agent or model that checks it isn’t a nice-to-have, it’s the entire mechanism that makes a loop trustworthy. Without a verifier with teeth, a loop will happily run a hundred iterations agreeing with itself.

It also explains why loops fail hardest on subjective goals. “Improve the user experience of this login page” or “write a strategy that will go viral” strip the system of a concrete exit condition. There’s no binary pass or fail to calculate against, so the model can’t determine a real stopping point. The loop either runs indefinitely, quietly converting your token budget into a large bill while making no measurable progress, or it declares victory on something that was never actually checked.

A loop’s reliability depends almost entirely on the quality of its verifier, not the underlying model’s capability. Conditions with no binary pass-or-fail check, like “improve the user experience,” strip a loop of a real stopping point and tend to either run indefinitely or falsely report completion.

The cleanest production version of this is the maker-checker split. One sub-agent, the maker, drafts the code or output. A separate, independent sub-agent, the checker, verifies it against tests, specs, and linters before the loop is allowed to call it done. A single model instance grading its own work has a built-in confirmation bias. It will not reliably catch its own bugs. Two independent agents, one building and one auditing, close that gap in a way no amount of clever prompting does on its own.

What Loop Engineering Gets Wrong When You Use It Badly

Loop engineering amplifies judgment, both good and bad. The same loop, built by two different people, can produce opposite outcomes. One person uses it to move faster on work they already understand deeply. The other uses it to avoid understanding the work at all. The loop has no way to tell the difference. You do, and that’s exactly what makes loop design harder than prompt engineering.

A few failure modes show up across nearly every serious write-up on the topic in 2026:

Unsupervised goals on unverifiable tasks. As covered above, anything without a binary check is a bad fit for an autonomous loop. The model will not generate a stopping condition out of thin air for a fuzzy creative or strategic goal.

Token cost spirals. A loop is inherently more expensive than a single prompt, by design, it’s running many turns instead of one. Without hard limits, that adds up fast. Uber reportedly capped engineers at $1,500 per person per tool per month for Claude Code and Cursor after the company burned through its annual AI budget in four months, according to reporting that circulated widely in the developer community in June 2026. The failure mode everyone in production fears, in one engineer’s words: without guardrails, you get infinite loops and billing surprises orders of magnitude over budget.

Comprehension debt. This is the quiet one. The faster a loop ships code or output you didn’t personally write, the bigger the gap grows between what exists in your system and what you actually understand about it. Addy Osmani, in the essay that named the practice, called the comfortable version of this “cognitive surrender”: the temptation to stop having an opinion and just take whatever the loop hands back. Designing the loop with judgment is the cure. Designing it to avoid thinking is the accelerant. Same action, opposite result.

Local minima. Even on deterministic, checkable tasks, unsupervised loops can get stuck oscillating. In AutoResearch experiments, agents facing genuinely hard optimization problems sometimes became conservative, adjusting a parameter by a fraction of a percent across dozens of cycles instead of trying a bold redesign, chasing nominal gains that never actually moved the needle.

The mitigation across all of these is consistent: a separate verifier sub-agent with real authority to say no, human review before anything irreversible ships, and hard turn or dollar limits on anything that runs unattended. None of that is exotic. It’s the same discipline that makes any automated system safe to leave running, applied to a system that happens to think in natural language.

Is Loop Engineering Just Hype?

There’s a real and fair skeptical case here, and it’s worth giving it room rather than dismissing it.

The sharpest version of the critique: a loop is just cron plus a decision-maker in the body, and cron was invented in 1975. If your entire definition of loop engineering is “a thing that runs on a timer,” that’s a fair hit. Several practitioners writing on this in June 2026 made exactly that point, and at least one well-known developer educator publicly pushed back on the framing as recycled automation with a new name.

The counter-argument is narrower but holds up. The pattern of an agent acting in a feedback loop isn’t new, it predates the term by years, going back to ReAct in 2022. What’s new is that it became economically and practically viable to run unattended in production. Three things shifted at once in 2025 and 2026: models got reliable enough that a loop converges in a handful of turns instead of spinning forty times before landing on a fix; context windows got large enough that a full codebase fits on every retry instead of forcing the agent to work half-blind; and the per-token cost of running multi-turn agentic work came down enough to make the math work for more use cases than just the highest-value engineering tasks.

So is it new? The underlying idea isn’t. The fact that it’s now buildable with a slash command instead of a maintained pile of bash scripts you alone understand, that part is genuinely new, and that’s what changed the conversation in June 2026. Whether loop engineering as a named discipline still matters in two years is an open question. Whether the underlying pattern, define a verifiable end state and let a system iterate against it, keeps mattering is not really in doubt at this point.

Conclusion

Loop engineering names something real: the leverage point in working with AI agents moved from the words you type to the system you design around the agent. The mechanics are concrete and learnable, a measurable goal, a stated check, a separate verifier, and a hard limit on how long an unattended loop is allowed to run. The failure modes are just as concrete: subjective goals with no real finish line, token spend with no ceiling, and the comprehension debt that builds up the moment you stop reading what the loop actually produced.

If you’re working with AI agents day to day, the shift worth internalizing isn’t “loops are the new prompts.” It’s that defining what “done” actually means, in a way a system can check without you, is now the scarce skill. Prompting got you the first answer. Loops get you the verified one, but only if you built the verifier with the same care you used to put into the prompt.

Frequently Asked Questions About Loop Engineering

What is loop engineering in simple terms?

Loop engineering is designing a system that prompts an AI agent for you, instead of typing each prompt yourself. You define a measurable goal, and the system runs the agent, checks its work, and repeats until that goal is met or a limit is hit.

Is loop engineering the same as prompt engineering?

No. Prompt engineering shapes a single instruction you write by hand. Loop engineering shapes the process around the agent, the trigger, the verification, and the stop condition, that decides what gets prompted next and when the work is actually finished.

Who coined the term loop engineering?

Peter Steinberger, the developer behind the OpenClaw framework, posted the framing on June 7, 2026. Addy Osmani, an engineering lead at Google, named and structured the practice in an essay days later, building on points from Steinberger and from Boris Cherny, who leads the Claude Code team at Anthropic.

What is the Ralph technique and how does it relate to loop engineering?

The Ralph technique, published by Geoffrey Huntley in July 2025, runs a coding agent inside a plain bash while-loop, resetting its context to a fixed set of files on every iteration. It’s considered the direct precursor to the native loop commands that later shipped inside tools like Claude Code.

What’s the difference between Claude Code’s /goal and /loop commands?

/goal pushes one piece of work to a finish line, the next turn starts as soon as the last one ends, and a separate evaluator model decides when the condition is met. /loop re-runs on a schedule and stops when you tell it to. Use /goal when the job has a clear “done.” Use /loop when you’re watching for something external to change.

Why does loop engineering fail on creative or strategic tasks?

Loops need a binary, checkable condition to know when to stop. Goals like “improve the user experience” or “write something that goes viral” have no concrete pass-or-fail test, so the system either runs indefinitely or falsely reports the work as complete.

Is loop engineering only for developers?

Right now, almost entirely, yes. Every clean, well-documented example involves code: migrating an API, fixing tests, clearing a backlog. There are early, scattered mentions of loops touching marketing pages or SEO research, but no established non-technical playbook exists yet.

Does loop engineering actually save money, or is it more expensive?

It’s more expensive per task by default, since it runs many turns instead of one. Whether that’s worth it depends entirely on whether you’ve capped token spend and set hard turn limits. Without guardrails, the cost can run far past expectations, as some companies reportedly learned the hard way in 2026.

Is loop engineering just automation with a new name?

Partly, and that critique has merit. The agent-acts-and-checks pattern predates the term by years. What’s genuinely new is that capable models, large context windows, and lower costs made it practical to ship as native product features in 2025 and 2026, rather than something only a small number of teams could hand-build and maintain.

Loop Engineering: What it is and Why it’s Replacing Prompt Engineering in 2026