Karpathy’s autoresearch: AI Agents Running Research
karpathy/autoresearch hit 89K GitHub stars in 2 weeks (June 2026). The idea: let an AI agent run ML experiments autonomously on a single GPU.
What it does
The repo contains a single CLAUDE.md (the prompt) that tells Claude Code:
- “You are a researcher at nanochat”
- “Read the codebase, propose experiments, run them, log results”
- “Iterate for hours, don’t ask me questions”
The agent then runs python -m experiments.X in a loop, modifying hyperparameters and tracking results.
The core pattern
# CLAUDE.md
You are running on a single H100. Time budget: 6 hours.
Each iteration:
1. Read current state of `experiments/`
2. Pick one untried hypothesis
3. Write a script that tests it
4. Run for max 30 minutes
5. Compare to baseline
6. Update `experiments/notes.md` with result
Do NOT:
- Modify model architecture
- Touch data pipeline
- Ask me anything
Adapt the pattern to your work
For backend services:
You are optimizing a Python FastAPI service.
- Time: 4 hours
- Each iteration: profile → pick bottleneck → fix → benchmark
- Update notes/endpoints.md with latency before/after
For data science:
You are improving churn prediction.
- Time: 6 hours
- Each iteration: hypothesis → feature engineering → train → evaluate
- Update notes/experiments.md with F1/AUC
Key takeaways
- The CLAUDE.md is the entire interface (no UI, no API)
- “Time budget” + “don’t ask” are the magic phrases
- The notes/ folder becomes a paper trail