How Karpathy's autoresearch repo (89K stars in 2 weeks) uses Claude Code agents to autonomously run ML experiments. The pattern, the prompt, and how to adapt it.

Karpathy’s autoresearch: AI Agents Running Research

karpathy/autoresearch hit 89K GitHub stars in 2 weeks (June 2026). The idea: let an AI agent run ML experiments autonomously on a single GPU.

What it does

The repo contains a single CLAUDE.md (the prompt) that tells Claude Code:

“You are a researcher at nanochat”
“Read the codebase, propose experiments, run them, log results”
“Iterate for hours, don’t ask me questions”

The agent then runs python -m experiments.X in a loop, modifying hyperparameters and tracking results.

The core pattern

# CLAUDE.md
You are running on a single H100. Time budget: 6 hours.

Each iteration:
1. Read current state of `experiments/`
2. Pick one untried hypothesis
3. Write a script that tests it
4. Run for max 30 minutes
5. Compare to baseline
6. Update `experiments/notes.md` with result

Do NOT:
- Modify model architecture
- Touch data pipeline
- Ask me anything

Adapt the pattern to your work

For backend services:

You are optimizing a Python FastAPI service.
- Time: 4 hours
- Each iteration: profile → pick bottleneck → fix → benchmark
- Update notes/endpoints.md with latency before/after

For data science:

You are improving churn prediction.
- Time: 6 hours
- Each iteration: hypothesis → feature engineering → train → evaluate
- Update notes/experiments.md with F1/AUC

Key takeaways

The CLAUDE.md is the entire interface (no UI, no API)
“Time budget” + “don’t ask” are the magic phrases
The notes/ folder becomes a paper trail

Andrej Karpathy's autoresearch: AI Agents Running Research

Karpathy’s autoresearch: AI Agents Running Research

What it does

The core pattern

Adapt the pattern to your work

Key takeaways

📦 开源项目

🛠️ Related Tools & Resources