Look at a random AI-generated repo and you expect a toy — a 300-line scaffold, a single page, a quick demo. The Opus 4.7 corpus defies that expectation.
Size distribution across 7,810 analyzed repos
| Size bucket | Count | Share |
|---|---|---|
| Scaffold/demo (<500 LOC) | 483 | 6% |
| Small (500-2K LOC) | 1,095 | 14% |
| Medium (2K-10K LOC) | 2,748 | 35% |
| Large (10K-50K LOC) | 2,570 | 33% |
| Massive (>50K LOC) | 928 | 12% |
45% of Opus 4.7 repos exceed 10,000 lines of code. 12% — nearly 1,000 repos — are over 50,000 lines. That’s genuine enterprise scale.
What does 50,000+ LOC look like?
Pick a random 50K-LOC Opus 4.7 repo and you find:
- Full-stack monorepo (apps/ + packages/)
- 10+ pnpm workspace packages
- Both frontend (Next.js) and backend (FastAPI or Next.js API routes)
- A dedicated
dbpackage with Prisma or Drizzle schema - An
authpackage wrapping NextAuth or Better-Auth - A
uipackage with shadcn-ified components - Hundreds of React components
- Multi-stage Docker setup with docker-compose.yml
- Documentation split across README + CLAUDE.md + API docs
This is architecture, not autocomplete. Someone — either Opus 4.7 driven by a skilled prompter, or Opus 4.7 operating in an agent loop — made real decisions about module boundaries, packaging, and dependency layering.
The long tail
The top 10 Opus 4.7 repos by size:
| Repo | LOC |
|---|---|
| BOLDPreciousMetals/bold-ops-dashboard | 5,085,152 |
| EndUser123/why | 2,708,261 |
| Halildeu/platform-ssot | 2,467,407 |
| Eaglemamba/SterileGMP-Knowledge-Hub | 2,087,676 |
| brianonbased-dev/HoloScript | 1,830,422 |
| elizaOS/eliza | 1,769,862 |
| ether/ether.github.com | 1,299,135 |
| lee101/stock-prediction | 1,145,380 |
| OleBB/wave_project | 1,012,175 |
| hbfs-cloud/articles | 1,003,688 |
Every one of the top 10 is over a million lines of code. elizaOS/eliza is particularly notable — it’s a well-known open-source agent framework that also uses Opus 4.7 in its development loop.
Scaffolds are the minority
The “throwaway demo” category — under 500 LOC — accounts for just 6% of the corpus. That’s the opposite of what you’d expect from AI-generated code. It suggests:
- Opus 4.7 is used for sustained work, not just one-shot generation
- Agent loops expand codebases over time — CLAUDE.md persistence lets the model pick up where it left off
- “Build me X” scaffolds grow once someone hits “run” and the app works
Implication for training data
If you’re pulling training data from this corpus, don’t over-weight the small-size bucket. Opus 4.7’s actual output distribution is centered on the 2K-50K LOC band (68% of repos) — and that’s where the most production-representative patterns live.
A median Opus 4.7 repo is 7,722 lines. That’s the size of a small real product. Train for that.
See also: The Top 30 Libraries Claude Opus 4.7 Actually Imports.