RALPH Loops: How OpenClaw Builds Entire Applications Autonomously Using Two AI Agents

You start a coding session with Claude at 2pm. By 2:30pm, you’re making progress. By 4pm, the context window is full, the agent is confused, and you’re copy-pasting git diffs trying to get it back on track.

There’s a better way.

The Agent Babysitting Problem

Traditional AI coding workflows look like this:

Start a conversation with your favorite coding agent
Give it a task
Watch it work (and occasionally hallucinate)
Hit the context limit after an hour
Start over, manually catching it up
Repeat until exhausted

This works fine for small tasks. But building a real application? You’re stuck babysitting an AI, explaining what it just did, and constantly fighting context limits.

The fundamental issue: you’re treating the agent’s memory as critical state. When context fills up, you lose everything.

The Core Insight: Context Is a Cache, Not State

Here’s the key realization that makes autonomous coding possible:

If your agent can’t reconstruct its situation from files alone, your architecture has a single point of failure sitting in a context window.

Think about how real development teams work. When a new developer joins a project, they don’t get telepathic access to the previous developer’s brain. They read:

The codebase
Git history
Documentation
Test results

Everything they need is in files. Context is just a performance optimization—a cache to speed up understanding. It’s not state.

So why are we treating AI agent context as irreplaceable?

The RALPH Loop Architecture

The solution is elegant: run many short coding sprints instead of one long marathon.

A RALPH loop is a wrapper script that repeatedly launches a coding agent with the same instructions until the work is complete. Each iteration:

Starts completely fresh (zero context)
Reads the file system and git history
Figures out what’s done and what’s left
Makes progress
Commits changes
Exits

Then the loop starts over with a fresh agent.

No context limits. No drift. No hallucination spiral. Just steady, reliable progress.

$ # Simplified RALPH loop pseudocode
while [ tasks_remaining ]; do
  # Fresh agent, zero context
  agent = spawn_fresh_coding_agent()
  
  # Agent reads files to understand state
  progress = agent.assess_from_files()
  
  # Make progress
  agent.code_until_checkpoint()
  
  # Commit and exit
  git commit
  
  # Loop continues with new agent
done

The Two-Agent Strategy

Here’s where it gets interesting. We’ve found the best results come from splitting planning and execution across different models:

Planning Agent: Anthropic Claude Sonnet

Role: Architect and reviewer
Tasks:
- Writing PRDs and breaking down features
- Designing system architecture
- Reviewing output and defining acceptance criteria
- Making strategic decisions
Why: Excels at reasoning, system design, and nuanced understanding
Cost: More expensive, but worth it for critical thinking

Execution Agent: OpenAI Codex

Role: Implementation specialist
Tasks:
- Writing actual code
- Implementing features from specs
- Running tests and fixing bugs
- Following TDD patterns
Why: Fast, cheap, optimized for code generation
Cost: Significantly cheaper for high-volume coding

Think of it like a real tech company: the architect doesn’t write every line of code, and the developers don’t redesign the system for every ticket. Each role plays to its strengths.

Writing PRDs That Actually Work

The agent needs crystal-clear acceptance criteria. We use PRDs (Product Requirements Documents) written as checklists. Here’s an example for a URL shortener project:

URL Shortener – Product Requirements Document

Overview

Build a production-ready URL shortening service with analytics.

Technical Stack

Backend: Node.js with TypeScript + Express
Database: SQLite (simple, file-based)
Frontend: React with Vite
Deployment: Railway

Tasks

Phase 1: Backend Foundation

Initialize Node.js project with TypeScript configuration
Set up Express server with CORS and basic middleware
Create SQLite database schema (urls, analytics tables)
Implement base62 encoding for short codes
Write tests for encoding/decoding logic

Phase 2: Core API

Build POST /api/shorten endpoint (create short URL)
Build GET /:code endpoint (redirect to long URL)
Add input validation (URL format, length limits)
Implement rate limiting (10 requests/minute per IP)
Add error handling and status codes
Write integration tests for all endpoints

Phase 3: Analytics

Track clicks (timestamp, IP, user agent)
Build GET /api/stats/:code endpoint
Aggregate daily/weekly statistics
Write tests for analytics logic

Phase 4: Frontend

Create React app with Vite
Build URL input form with validation
Display generated short URL with copy button
Show click statistics for recent URLs
Add error states and loading indicators

Phase 5: Deployment

Configure Railway deployment
Set up environment variables
Add health check endpoint
Test production deployment
Update README with deployment instructions

Acceptance Criteria

All tasks must be completed
All tests must pass
Code must be TypeScript with no any types
API must return proper HTTP status codes
Frontend must work in Chrome, Firefox, Safari
Deployment must be accessible via public URL

Done When

URL shortener works end-to-end
Analytics track and display correctly
All tests pass
Successfully deployed to Railway

The RALPH loop validates completion by checking if all tasks are complete. Agent claims it’s done but 12/19 tasks remain? Restart immediately. No negotiating with a confused model.

This sounds rigid, but rigidity is what makes it work. A non-deterministic worker needs deterministic acceptance criteria.

The Secret Weapon: Test-Driven Prompts

This is the single most impactful technique for reliable AI coding:

For any real logic, write the test first, then the implementation.

Instead of:

"Build user authentication with JWT tokens"

Write:

"Write a test suite for JWT authentication that verifies:
1. Valid tokens are accepted
2. Expired tokens are rejected (24-hour expiry)
3. Invalid signatures are rejected
4. Missing tokens return 401

Then implement the authentication middleware to make all tests pass."

Why This Works

Tests are deterministic acceptance criteria for a non-deterministic worker.

When the agent writes the test first:

It crystallizes exactly what “correct” means
Implementation has a clear target
You get automated verification
Post-merge failures drop dramatically

When to Use TDD Prompts

Use for:

Authentication/authorization logic
Data processing pipelines
API endpoint behavior
Business logic
Database operations
Any code where “working” has a clear definition

Skip for:

Config file updates
Copy/formatting changes
Simple UI tweaks
Documentation updates

For complex features, TDD prompts are mandatory, not optional.

Step-by-Step: Your First RALPH Loop

Let’s build a real project: a URL shortener service.

Step 1: Write the PRD

Create PRD.md with clear, checkable tasks (see example above).

Key principles:

Be specific (“Set up Express server” not “Set up backend”)
Make tasks testable (“Write tests for X” not “Make X work”)
Include acceptance criteria
Order tasks by dependency

Step 2: Initialize Git Repository

$ mkdir url-shortener
$ cd url-shortener
$ 
$ # Create PRD
$ cat > PRD.md << 'EOF'
[Paste your PRD here]
EOF
$ 
$ # Initialize git
$ git init
$ git add PRD.md
$ git commit -m "Initial commit: Add PRD"

Critical: Everything must be in git. The agent reconstructs state from commits.

Step 3: Start the RALPH Loop

$ # Install OpenClaw if you haven't
$ npm install -g openclaw
$ 
$ # Start RALPH loop
$ bash ralph-start.sh \
  --project url-shortener \
  --prd PRD.md \
  --repo .

This creates a tmux session where Codex runs autonomously, reading the PRD and making commits.

Step 4: Monitor Progress

$ # Watch in real-time
$ tmux attach -t ralph-url-shortener-[timestamp]
$ 
$ # Or check git commits
$ git log --oneline
$ 
$ # Or check PRD status
$ cat PRD.md  # Tasks will be completed as agent progresses

Step 5: Let It Run

Here’s the hard part: step away.

Check in periodically, but resist the urge to micromanage. The loop is designed to:

Detect when it’s stuck (no commits for 2+ hours)
Auto-restart on failures (up to 3 times)
Update the PRD as tasks complete
Stop when all tasks are done

A full-stack app often completes overnight.

The Infrastructure Behind RALPH

OpenClaw provides built-in RALPH infrastructure:

Registry System

Tracks multiple concurrent RALPH projects:

$ # List all running projects
$ bash ralph-registry.sh list
$ 
$ # Check specific project status
$ bash ralph-registry.sh get-status url-shortener

Cleanup Automation

Detects and handles stuck sessions:

Finished but not marked: Auto-completes based on PRD
Orphaned sessions: Registry updated if tmux session died
Stuck (no commits 2+ hours): Marked as stuck/failed
Zero remaining tasks: Auto-marked complete, session killed

Runs automatically every hour via heartbeat.

Auto-Restart

If the agent crashes or hits an error:

Loop captures the failure
Waits 30 seconds
Starts fresh agent with same PRD
Max 3 restarts (prevents infinite loops)

Stuck Detection

Monitors git commits:

No commits for 2 hours = likely stuck
Registry marks as stuck
Heartbeat reports to monitoring
You can intervene or let cleanup handle it

Monitoring in Real-Time

Attach to Session

$ tmux attach -t ralph-url-shortener-[timestamp]

You’ll see Codex working in real-time: reading files, planning, writing code, running tests.

Git Commits

$ # Watch commits appear
$ watch -n 5 'git log --oneline -10'

Each commit = incremental progress.

PRD Updates

$ # Watch tasks get completed
$ watch -n 10 'cat PRD.md'

The agent updates the PRD as it completes tasks.

Registry Status

$ # High-level overview
$ bash ralph-registry.sh get-status url-shortener
$ # Output: running | stuck | complete

Best Practices

✅ Do This

Write Specific, Testable Tasks

Bad: “Build authentication”
Good: “Implement JWT token generation with 24-hour expiry”

Use Test-Driven Prompts for Logic

Include “write tests first” in your PRD
Specify what tests should verify
Make tests part of acceptance criteria

Let the Planning Agent Design, Execution Agent Code

Use Sonnet to break down features
Use Codex to implement them
Don’t mix roles

Keep Tasks Granular

Each task should take 10-30 minutes
Big tasks = agent gets lost
Small tasks = steady progress

Monitor First Few Iterations

Watch the first 2-3 cycles
Make sure agent understands the PRD
Adjust if needed, then let it run

❌ Don’t Do This

Don’t Micromanage

Resist the urge to intervene constantly
Trust the loop to recover from errors
Check in every few hours, not every few minutes

Don’t Write Vague PRDs

“Make it better” = agent paralysis
“Improve performance” = meaningless
Be specific about what and why

Don’t Use RALPH for Exploratory Work

RALPH needs clear goals
Use interactive sessions for discovery
Switch to RALPH once you know what to build

Don’t Skip Git

Every change must be committed
Agent relies on git history
No commits = no state reconstruction

Don’t Ignore Stuck Sessions

Check registry status daily
Investigate if stuck > 4 hours
Kill and restart if needed

Cost Considerations

Here’s the economic advantage of the two-agent split:

Planning with Claude Sonnet

When: PRD writing, architecture design, code review
Frequency: Once per project + occasional reviews
Cost: ~$0.50-2.00 per project (depending on size)
Value: High-quality system design

Execution with OpenAI Codex

When: All the actual coding
Frequency: Hundreds of iterations
Cost: ~$0.01-0.05 per iteration
Value: Fast, reliable code generation

Example: Full-Stack URL Shortener

Planning (Sonnet): $1.50 (write PRD, review architecture)
Execution (Codex): $3.00 (150 iterations @ $0.02 each)
Total: ~$4.50

Compare to:

Your time: 6-8 hours @ $50/hr = $300-400
All Sonnet: $15-25 (overkill for implementation)
Babysitting one long session: frustration + context limit issues

The math is clear: Autonomous coding with cheap execution is wildly more efficient.

Real Results

What can you actually build with RALPH loops?

We’ve Shipped

Complete REST APIs with auth
Full-stack web applications
Database migration tools
Chrome extensions
CLI tools with 1000+ lines
WordPress plugins
Discord bots

Typical Timeline

Small project (CLI tool, simple API): 2-4 hours
Medium project (full-stack app, auth, database): 8-12 hours
Large project (complex multi-service system): 24-48 hours

All while you sleep, work on other things, or just live your life.

Code Quality

Because we enforce:

Tests for all logic (TDD prompts)
TypeScript with strict mode
Linting and formatting
Git commits for every change

The output is often cleaner than hand-written code.

Autonomous Development Is Here

RALPH loops prove that AI agents can ship entire projects without constant supervision.

The key insights:

Context is disposable – Everything lives in files
Split planning and execution – Use the right model for each job
Clear specs with checkboxes – Deterministic criteria for non-deterministic workers
Test-driven prompts – Crystallize correctness before implementation
Many short sprints – Avoid context limits and drift

Getting Started

Your first autonomous build is an hour away:

Install OpenClaw
```
$ npm install -g openclaw
```
Pick a small project
- URL shortener, todo API, markdown blog
- Something you could build in a weekend
- 10-20 clear tasks
Write a detailed PRD
- Use clear task lists
- Be specific about each task
- Include test requirements
- Define “done” clearly

Start the loop

$ bash ralph-start.sh --project my-first-ralph --prd PRD.md --repo .

Watch the first hour
- Make sure it understands the PRD
- Check that commits are meaningful
- Adjust if needed
Let it run
- Check back in 4-6 hours
- Review the code
- Test the result

What’s Next?

Once you’ve run your first RALPH loop, try:

Bigger projects – Multi-service applications
Production deployments – Include CI/CD in your PRD
Multiple concurrent loops – Build 3 projects simultaneously
Custom templates – Create PRD templates for common patterns

The future of development isn’t replacing developers with AI. It’s giving developers autonomous AI teammates that handle implementation while you focus on architecture and product decisions.

RALPH loops are that future, available today.

Resources

OpenClaw Documentation
RALPH Loop Scripts (included with OpenClaw)
Example PRDs
OpenClaw Discord Community

Try It Yourself

Ready to build your first autonomous project?

Install OpenClaw
Pick a small project
Write a PRD
Start a RALPH loop
Come back in a few hours to a finished application

It’s that simple.

Have questions or want to share what you built? Join the OpenClaw Discord community or tag us on X @LearnOpenClaw.