RALPH Loops: How OpenClaw Builds Entire Applications Autonomously Using Two AI Agents

You start a coding session with Claude at 2pm. By 2:30pm, you’re making progress. By 4pm, the context window is full, the agent is confused, and you’re copy-pasting git diffs trying to get it back on track.

There’s a better way.

The Agent Babysitting Problem

Traditional AI coding workflows look like this:

  1. Start a conversation with your favorite coding agent
  2. Give it a task
  3. Watch it work (and occasionally hallucinate)
  4. Hit the context limit after an hour
  5. Start over, manually catching it up
  6. Repeat until exhausted

This works fine for small tasks. But building a real application? You’re stuck babysitting an AI, explaining what it just did, and constantly fighting context limits.

The fundamental issue: you’re treating the agent’s memory as critical state. When context fills up, you lose everything.

The Core Insight: Context Is a Cache, Not State

Here’s the key realization that makes autonomous coding possible:

If your agent can’t reconstruct its situation from files alone, your architecture has a single point of failure sitting in a context window.

Think about how real development teams work. When a new developer joins a project, they don’t get telepathic access to the previous developer’s brain. They read:

  • The codebase
  • Git history
  • Documentation
  • Test results

Everything they need is in files. Context is just a performance optimization—a cache to speed up understanding. It’s not state.

So why are we treating AI agent context as irreplaceable?

The RALPH Loop Architecture

The solution is elegant: run many short coding sprints instead of one long marathon.

A RALPH loop is a wrapper script that repeatedly launches a coding agent with the same instructions until the work is complete. Each iteration:

  1. Starts completely fresh (zero context)
  2. Reads the file system and git history
  3. Figures out what’s done and what’s left
  4. Makes progress
  5. Commits changes
  6. Exits

Then the loop starts over with a fresh agent.

No context limits. No drift. No hallucination spiral. Just steady, reliable progress.

$ # Simplified RALPH loop pseudocode
while [ tasks_remaining ]; do
  # Fresh agent, zero context
  agent = spawn_fresh_coding_agent()
  
  # Agent reads files to understand state
  progress = agent.assess_from_files()
  
  # Make progress
  agent.code_until_checkpoint()
  
  # Commit and exit
  git commit
  
  # Loop continues with new agent
done

The Two-Agent Strategy

Here’s where it gets interesting. We’ve found the best results come from splitting planning and execution across different models:

Planning Agent: Anthropic Claude Sonnet

  • Role: Architect and reviewer
  • Tasks:
    • Writing PRDs and breaking down features
    • Designing system architecture
    • Reviewing output and defining acceptance criteria
    • Making strategic decisions
  • Why: Excels at reasoning, system design, and nuanced understanding
  • Cost: More expensive, but worth it for critical thinking

Execution Agent: OpenAI Codex

  • Role: Implementation specialist
  • Tasks:
    • Writing actual code
    • Implementing features from specs
    • Running tests and fixing bugs
    • Following TDD patterns
  • Why: Fast, cheap, optimized for code generation
  • Cost: Significantly cheaper for high-volume coding

Think of it like a real tech company: the architect doesn’t write every line of code, and the developers don’t redesign the system for every ticket. Each role plays to its strengths.

Writing PRDs That Actually Work

The agent needs crystal-clear acceptance criteria. We use PRDs (Product Requirements Documents) written as checklists. Here’s an example for a URL shortener project:

URL Shortener – Product Requirements Document

Overview

Build a production-ready URL shortening service with analytics.

Technical Stack

  • Backend: Node.js with TypeScript + Express
  • Database: SQLite (simple, file-based)
  • Frontend: React with Vite
  • Deployment: Railway

Tasks

Phase 1: Backend Foundation

  • Initialize Node.js project with TypeScript configuration
  • Set up Express server with CORS and basic middleware
  • Create SQLite database schema (urls, analytics tables)
  • Implement base62 encoding for short codes
  • Write tests for encoding/decoding logic

Phase 2: Core API

  • Build POST /api/shorten endpoint (create short URL)
  • Build GET /:code endpoint (redirect to long URL)
  • Add input validation (URL format, length limits)
  • Implement rate limiting (10 requests/minute per IP)
  • Add error handling and status codes
  • Write integration tests for all endpoints

Phase 3: Analytics

  • Track clicks (timestamp, IP, user agent)
  • Build GET /api/stats/:code endpoint
  • Aggregate daily/weekly statistics
  • Write tests for analytics logic

Phase 4: Frontend

  • Create React app with Vite
  • Build URL input form with validation
  • Display generated short URL with copy button
  • Show click statistics for recent URLs
  • Add error states and loading indicators

Phase 5: Deployment

  • Configure Railway deployment
  • Set up environment variables
  • Add health check endpoint
  • Test production deployment
  • Update README with deployment instructions

Acceptance Criteria

  • All tasks must be completed
  • All tests must pass
  • Code must be TypeScript with no any types
  • API must return proper HTTP status codes
  • Frontend must work in Chrome, Firefox, Safari
  • Deployment must be accessible via public URL

Done When

  • URL shortener works end-to-end
  • Analytics track and display correctly
  • All tests pass
  • Successfully deployed to Railway

The RALPH loop validates completion by checking if all tasks are complete. Agent claims it’s done but 12/19 tasks remain? Restart immediately. No negotiating with a confused model.

This sounds rigid, but rigidity is what makes it work. A non-deterministic worker needs deterministic acceptance criteria.

The Secret Weapon: Test-Driven Prompts

This is the single most impactful technique for reliable AI coding:

For any real logic, write the test first, then the implementation.

Instead of:

"Build user authentication with JWT tokens"

Write:

"Write a test suite for JWT authentication that verifies:
1. Valid tokens are accepted
2. Expired tokens are rejected (24-hour expiry)
3. Invalid signatures are rejected
4. Missing tokens return 401

Then implement the authentication middleware to make all tests pass."

Why This Works

Tests are deterministic acceptance criteria for a non-deterministic worker.

When the agent writes the test first:

  1. It crystallizes exactly what “correct” means
  2. Implementation has a clear target
  3. You get automated verification
  4. Post-merge failures drop dramatically

When to Use TDD Prompts

Use for:

  • Authentication/authorization logic
  • Data processing pipelines
  • API endpoint behavior
  • Business logic
  • Database operations
  • Any code where “working” has a clear definition

Skip for:

  • Config file updates
  • Copy/formatting changes
  • Simple UI tweaks
  • Documentation updates

For complex features, TDD prompts are mandatory, not optional.

Step-by-Step: Your First RALPH Loop

Let’s build a real project: a URL shortener service.

Step 1: Write the PRD

Create PRD.md with clear, checkable tasks (see example above).

Key principles:

  • Be specific (“Set up Express server” not “Set up backend”)
  • Make tasks testable (“Write tests for X” not “Make X work”)
  • Include acceptance criteria
  • Order tasks by dependency

Step 2: Initialize Git Repository

$ mkdir url-shortener
$ cd url-shortener
$ 
$ # Create PRD
$ cat > PRD.md << 'EOF'
[Paste your PRD here]
EOF
$ 
$ # Initialize git
$ git init
$ git add PRD.md
$ git commit -m "Initial commit: Add PRD"

Critical: Everything must be in git. The agent reconstructs state from commits.

Step 3: Start the RALPH Loop

$ # Install OpenClaw if you haven't
$ npm install -g openclaw
$ 
$ # Start RALPH loop
$ bash ralph-start.sh \
  --project url-shortener \
  --prd PRD.md \
  --repo .

This creates a tmux session where Codex runs autonomously, reading the PRD and making commits.

Step 4: Monitor Progress

$ # Watch in real-time
$ tmux attach -t ralph-url-shortener-[timestamp]
$ 
$ # Or check git commits
$ git log --oneline
$ 
$ # Or check PRD status
$ cat PRD.md  # Tasks will be completed as agent progresses

Step 5: Let It Run

Here’s the hard part: step away.

Check in periodically, but resist the urge to micromanage. The loop is designed to:

  • Detect when it’s stuck (no commits for 2+ hours)
  • Auto-restart on failures (up to 3 times)
  • Update the PRD as tasks complete
  • Stop when all tasks are done

A full-stack app often completes overnight.

The Infrastructure Behind RALPH

OpenClaw provides built-in RALPH infrastructure:

Registry System

Tracks multiple concurrent RALPH projects:

$ # List all running projects
$ bash ralph-registry.sh list
$ 
$ # Check specific project status
$ bash ralph-registry.sh get-status url-shortener

Cleanup Automation

Detects and handles stuck sessions:

  • Finished but not marked: Auto-completes based on PRD
  • Orphaned sessions: Registry updated if tmux session died
  • Stuck (no commits 2+ hours): Marked as stuck/failed
  • Zero remaining tasks: Auto-marked complete, session killed

Runs automatically every hour via heartbeat.

Auto-Restart

If the agent crashes or hits an error:

  1. Loop captures the failure
  2. Waits 30 seconds
  3. Starts fresh agent with same PRD
  4. Max 3 restarts (prevents infinite loops)

Stuck Detection

Monitors git commits:

  • No commits for 2 hours = likely stuck
  • Registry marks as stuck
  • Heartbeat reports to monitoring
  • You can intervene or let cleanup handle it

Monitoring in Real-Time

Attach to Session

$ tmux attach -t ralph-url-shortener-[timestamp]

You’ll see Codex working in real-time: reading files, planning, writing code, running tests.

Git Commits

$ # Watch commits appear
$ watch -n 5 'git log --oneline -10'

Each commit = incremental progress.

PRD Updates

$ # Watch tasks get completed
$ watch -n 10 'cat PRD.md'

The agent updates the PRD as it completes tasks.

Registry Status

$ # High-level overview
$ bash ralph-registry.sh get-status url-shortener
$ # Output: running | stuck | complete

Best Practices

✅ Do This

Write Specific, Testable Tasks

  • Bad: “Build authentication”
  • Good: “Implement JWT token generation with 24-hour expiry”

Use Test-Driven Prompts for Logic

  • Include “write tests first” in your PRD
  • Specify what tests should verify
  • Make tests part of acceptance criteria

Let the Planning Agent Design, Execution Agent Code

  • Use Sonnet to break down features
  • Use Codex to implement them
  • Don’t mix roles

Keep Tasks Granular

  • Each task should take 10-30 minutes
  • Big tasks = agent gets lost
  • Small tasks = steady progress

Monitor First Few Iterations

  • Watch the first 2-3 cycles
  • Make sure agent understands the PRD
  • Adjust if needed, then let it run

❌ Don’t Do This

Don’t Micromanage

  • Resist the urge to intervene constantly
  • Trust the loop to recover from errors
  • Check in every few hours, not every few minutes

Don’t Write Vague PRDs

  • “Make it better” = agent paralysis
  • “Improve performance” = meaningless
  • Be specific about what and why

Don’t Use RALPH for Exploratory Work

  • RALPH needs clear goals
  • Use interactive sessions for discovery
  • Switch to RALPH once you know what to build

Don’t Skip Git

  • Every change must be committed
  • Agent relies on git history
  • No commits = no state reconstruction

Don’t Ignore Stuck Sessions

  • Check registry status daily
  • Investigate if stuck > 4 hours
  • Kill and restart if needed

Cost Considerations

Here’s the economic advantage of the two-agent split:

Planning with Claude Sonnet

  • When: PRD writing, architecture design, code review
  • Frequency: Once per project + occasional reviews
  • Cost: ~$0.50-2.00 per project (depending on size)
  • Value: High-quality system design

Execution with OpenAI Codex

  • When: All the actual coding
  • Frequency: Hundreds of iterations
  • Cost: ~$0.01-0.05 per iteration
  • Value: Fast, reliable code generation

Example: Full-Stack URL Shortener

  • Planning (Sonnet): $1.50 (write PRD, review architecture)
  • Execution (Codex): $3.00 (150 iterations @ $0.02 each)
  • Total: ~$4.50

Compare to:

  • Your time: 6-8 hours @ $50/hr = $300-400
  • All Sonnet: $15-25 (overkill for implementation)
  • Babysitting one long session: frustration + context limit issues

The math is clear: Autonomous coding with cheap execution is wildly more efficient.

Real Results

What can you actually build with RALPH loops?

We’ve Shipped

  • Complete REST APIs with auth
  • Full-stack web applications
  • Database migration tools
  • Chrome extensions
  • CLI tools with 1000+ lines
  • WordPress plugins
  • Discord bots

Typical Timeline

  • Small project (CLI tool, simple API): 2-4 hours
  • Medium project (full-stack app, auth, database): 8-12 hours
  • Large project (complex multi-service system): 24-48 hours

All while you sleep, work on other things, or just live your life.

Code Quality

Because we enforce:

  • Tests for all logic (TDD prompts)
  • TypeScript with strict mode
  • Linting and formatting
  • Git commits for every change

The output is often cleaner than hand-written code.

Autonomous Development Is Here

RALPH loops prove that AI agents can ship entire projects without constant supervision.

The key insights:

  1. Context is disposable – Everything lives in files
  2. Split planning and execution – Use the right model for each job
  3. Clear specs with checkboxes – Deterministic criteria for non-deterministic workers
  4. Test-driven prompts – Crystallize correctness before implementation
  5. Many short sprints – Avoid context limits and drift

Getting Started

Your first autonomous build is an hour away:

  1. Install OpenClaw

    $ npm install -g openclaw
  2. Pick a small project

    • URL shortener, todo API, markdown blog
    • Something you could build in a weekend
    • 10-20 clear tasks
  3. Write a detailed PRD

    • Use clear task lists
    • Be specific about each task
    • Include test requirements
    • Define “done” clearly
  4. Start the loop

    $ bash ralph-start.sh --project my-first-ralph --prd PRD.md --repo .
  5. Watch the first hour

    • Make sure it understands the PRD
    • Check that commits are meaningful
    • Adjust if needed
  6. Let it run

    • Check back in 4-6 hours
    • Review the code
    • Test the result

What’s Next?

Once you’ve run your first RALPH loop, try:

  • Bigger projects – Multi-service applications
  • Production deployments – Include CI/CD in your PRD
  • Multiple concurrent loops – Build 3 projects simultaneously
  • Custom templates – Create PRD templates for common patterns

The future of development isn’t replacing developers with AI. It’s giving developers autonomous AI teammates that handle implementation while you focus on architecture and product decisions.

RALPH loops are that future, available today.


Resources

Try It Yourself

Ready to build your first autonomous project?

  1. Install OpenClaw
  2. Pick a small project
  3. Write a PRD
  4. Start a RALPH loop
  5. Come back in a few hours to a finished application

It’s that simple.

Have questions or want to share what you built? Join the OpenClaw Discord community or tag us on X @LearnOpenClaw.

Posted in:

Want to learn more about OpenClaw? 🦞

Join our community to get access to free support and special programs!

🎉

Welcome to the OpenClaw Community!

Check your email for next steps.