A real-world OpenEnv environment where AI agents learn to fix broken Python code. Submit buggy code, get instant fixes, and watch tests pass in real time.
Pick a task, review the buggy code, write your fix, and submit. Watch tests execute in real time.
Select a task and click "Load Task" to begin.The agent receives buggy Python code and descriptions of the tests it must pass.
→The agent analyzes the code, identifies the bug, and submits corrected code via step().
Tests run in a sandboxed subprocess. Reward = fraction passing (0.0–1.0). Iterate until all pass.
6 real-world debugging challenges across 3 difficulty levels.
reset() → Observation | step(action) → StepResult | state() → State