From RALPH to Zenith: Designing harnesses for long-running agents
Read the technical report on GitHub. TL;DR 01 Long-running agents often fail not because they cannot make progress, but because they stop before the task is truly complete. 02 We tested five harness designs across eight long-horizon tasks to isolate the control mechanisms that matter: repeated gap-finding, revisable planning, independent verification, adaptive orchestration, and stopping discipline. 03 RALPH is the strongest simple baseline because it forces each