Long-run AI agents, part 2: Three approaches that actually work

Victor Coimbra has been recognized in the Forbes Under 30 Brazil list for his outstanding contributions to AI innovation. He co-founded Artefact’s Latin American operations, which now serve as a global tech hub with 200 employees. He brings deep expertise in scaling AI solutions and building high-performance tech teams across international markets.

In Part 1, we examined the problem: AI systems degrade over time, benchmark performance masks production failures, and even experienced professionals may work more slowly with AI assistance than without it. The trajectory is promising. The current reality is messy.

So how do you build systems that actually sustain effort over hours?

Three approaches have emerged from different communities—each attacking the same fundamental problem: how does an AI system maintain coherent progress when its working memory is limited?

Approach 1: Fresh-Start Cycling

In late 2025, a technique with the absurd name “Ralph Wiggum” went viral among practitioners. The name comes from a Simpsons character—deliberately silly, because the core idea is almost embarrassingly simple.

Let the AI work. When it starts degrading, stop it. Start fresh. Let it pick up where it left off.

That is it. The AI works on a task until its performance starts declining. Then it stops, saves its progress to a file, and exits. A new session begins with a clean slate. The AI reads what was accomplished, identifies what remains, and continues.

The philosophy: stop fighting the memory limitation. Work with it. Each work session operates independently. Progress lives in documents and records, not in the AI’s head.

What This Looks Like in Practice

A typical implementation runs three phases:

Phase 1 (Requirements): Human and AI collaborate to identify what needs to be done. The output is a clear specification document.
Phase 2 (Planning): AI analyzes the gap between the specification and current state. Output: a prioritized list of tasks. No actual work yet.
Phase 3 (Execution): AI processes one task per session. Complete the task, verify it worked, document what was done, exit. New session. Repeat.

The key constraint: tasks must have clearly measurable completion. This approach works poorly for ambiguous requirements, judgment calls, or exploratory work without clear endpoints.

Results From Early Adopters

The numbers from practitioners are striking, though they come from self-reports rather than controlled studies.

One consultant delivered what would have been a $50,000 project for under $300 in AI costs—running automated sessions overnight. A startup team completed six major deliverables overnight with functioning outputs, verification, and documentation. One practitioner built an entire product over three months of automated sessions.

Typical costs range from $50-100 for substantial projects running 50+ work sessions. Each session runs 30-45 minutes before cycling.

Anthropic formalized this approach in December 2025, releasing official support. The pattern moved from workaround to endorsed methodology.

The Limitation

This approach is deterministic in an unpredictable world. As one practitioner puts it: “It’s better to fail predictably than succeed unpredictably.”

That is both the strength and the constraint. Fresh-start cycling works when you can define success clearly. It struggles when success is subjective, when quality is implicit, and when the “right” answer requires human judgment to recognize.

Approach 2: Selective Memory

Fresh-start cycling throws away everything between sessions. Every cycle begins completely fresh. What if you could selectively preserve the important parts?

Selective memory takes a different approach: extract and store the essential information, discard the rest. Instead of starting over entirely, the AI inherits a curated summary of what matters.

The Two-Role Pattern

A common implementation uses two specialized AI roles:

Setup Role: Runs only at the beginning. Establishes context, identifies key information, creates initial reference documents.
Working Role: Handles all subsequent sessions. Maintains continuity through three artifacts: a progress tracker showing completed and pending work, a checklist with items marked as done or remaining, and a change history showing what was modified and why.

The session startup is explicit: confirm current state, review progress documents, select highest-priority remaining work, verify baseline before new work.

The difference from fresh-start cycling: the compression step. The Working Role inherits a curated summary of relevant context. Research suggests this approach can allow AI to complete long task sequences using only 16% of the information it would otherwise need. An 84% reduction in overhead.

Advanced Memory: Relationship Preservation

The state of the art in selective memory preserves not just facts, but relationships.

Think about how humans remember projects. We do not just recall isolated facts. We remember that this decision led to that consequence, that this person owns that responsibility, that this document relates to that requirement. The connections matter as much as the content.

Advanced AI memory systems now capture these relationships. When storing information, they extract not just what happened, but who was involved, what it connected to, and why it mattered. When retrieving information, they can reconstruct context by following these relationship threads.

Performance metrics from these systems: 26% improvement in quality assessments. 90%+ reduction in information overhead while maintaining coherence. Significantly better handling of tasks that span multiple sessions.

The Trade-off

Selective memory adds complexity. You need infrastructure for storage and retrieval. You need to decide what to keep and what to discard. You need to trust that the compression preserves what matters.

This is not a solved problem. Memory systems can lose critical details. Compression can introduce subtle distortions. The AI may retrieve the wrong context at the wrong time. The 84% reduction sounds impressive until the 16% you kept was missing something essential.

Approach 3: Team Coordination

What if the answer is not one AI with better memory, but many AI systems with clear roles?

Team coordination decomposes complex work into specialized roles coordinated by a central manager. Each role has bounded scope, limited information needs, and a specific job. The manager maintains the big picture and routes only relevant information to each worker.

The Pattern Behind the Scenes

Leading AI companies use this internally. The structure:

Coordinator: A capable AI system responsible for analyzing requests, planning approach, maintaining memory, and directing specialists.

Specialists: Focused AI systems operating in parallel for specific tasks.

The result: team-based systems outperform single AI systems by 90% on complex research tasks. Not a marginal improvement. A near-doubling of performance.

The key insight: information management explains 80% of performance differences in team-based AI. The specific tools and AI models matter less than how information flows between roles.

Two Coordination Patterns

Handoff Pattern: One AI system hands control to another mid-task. Each knows about the others and decides when to defer. The work thread continues, but responsibility transfers. Works well for sequential, staged workflows.

Manager Pattern: A central coordinator assigns work to specialists and collects results. Specialists return outputs; the coordinator retains control and makes decisions. No handoff of the main thread. Works well for parallel processing and result synthesis.

The choice depends on your workflow. Handoffs work well when tasks naturally decompose into stages. Manager patterns work well when you need parallel processing with centralized decision-making.

Industry Standardization

In 2025, the industry standardized how AI systems connect to each other and to external resources. Think of it like the standardization of electrical outlets—different manufacturers’ products can now work together.

One standard defines how AI connects to information sources and tools. Another defines how AI systems communicate with each other. Together, they enable building blocks that can be assembled in different configurations.

This matters because it enables modularity. A workflow built by one team can incorporate components built by another. Memory systems become interchangeable. Information sources become discoverable. The “AI ecosystem” is not marketing—it is a technical reality these standards make possible.

The Overhead

Team-based AI uses approximately 15x more resources than single-interaction AI. That is the cost of coordination. For simple tasks, this overhead swamps any benefit. For complex tasks, the improved reliability justifies the expense.

The failure modes are also more complex. Poor handoff design caused one e-commerce company to see 40% customer abandonment when AI transitions confused users. Cascading failures can propagate through AI networks. One 2025 industry analysis identified 14 unique failure patterns across system design, coordination breakdowns, and quality verification.

Choosing the Right Approach

Here is how I think about these options:

Fresh-start cycling works when tasks have clearly measurable completion, you can tolerate predictable incremental progress, progress can be fully captured in documents and records, and you want simplicity over sophistication.
Selective memory works when tasks require preserving relationships across sessions, you have infrastructure for storage and retrieval, efficiency matters at scale, and you can invest in building compression systems.
Team coordination works when tasks naturally decompose into specialized subtasks, you need parallel processing, the coordination overhead (15x resources) is acceptable, and you can handle more complex failure modes.

Most production systems will combine elements of all three. A team-based system where each specialist uses fresh-start cycling. A memory-augmented coordinator directing stateless workers. The approaches are complementary, not exclusive.

The common thread: all three approaches externalize information that the AI cannot reliably maintain internally. They differ in how much they externalize and how they manage retrieval.

In Part 3, we will examine what long-running AI means for organizations: how work changes, what governance is required, and where the realistic opportunities are in 2026.

References

Research Papers

Building Production-Ready AI with Scalable Long-Term Memory — arxiv.org/abs/2504.19413
Multi-Graph Based Memory Architecture for AI — arxiv.org/abs/2601.03236
Measuring AI in Production — arxiv.org/abs/2512.04123

Industry Reports & Whitepapers

Failure Modes in AI Systems — Microsoft
Lessons from 2025 on AI and Trust — Google Cloud
State of AI Engineering — LangChain
Benchmark vs. Real-World Evaluation — METR

Technical Documentation

How We Built Our Multi-Agent Research System — Anthropic
Model Context Protocol Specification — modelcontextprotocol.io
Fresh-Start Cycling Documentation (“Ralph Wiggum”) — Geoffrey Huntley (ghuntley.com/ralph/)