Look at a random AI-generated repo and you expect a toy — a 300-line scaffold, a single page, a quick demo. The Opus 4.7 corpus defies that expectation.

Size distribution across 7,810 analyzed repos

Size bucket	Count	Share
Scaffold/demo (<500 LOC)	483	6%
Small (500-2K LOC)	1,095	14%
Medium (2K-10K LOC)	2,748	35%
Large (10K-50K LOC)	2,570	33%
Massive (>50K LOC)	928	12%

45% of Opus 4.7 repos exceed 10,000 lines of code. 12% — nearly 1,000 repos — are over 50,000 lines. That’s genuine enterprise scale.

What does 50,000+ LOC look like?

Pick a random 50K-LOC Opus 4.7 repo and you find:

Full-stack monorepo (apps/ + packages/)
10+ pnpm workspace packages
Both frontend (Next.js) and backend (FastAPI or Next.js API routes)
A dedicated db package with Prisma or Drizzle schema
An auth package wrapping NextAuth or Better-Auth
A ui package with shadcn-ified components
Hundreds of React components
Multi-stage Docker setup with docker-compose.yml
Documentation split across README + CLAUDE.md + API docs

This is architecture, not autocomplete. Someone — either Opus 4.7 driven by a skilled prompter, or Opus 4.7 operating in an agent loop — made real decisions about module boundaries, packaging, and dependency layering.

The long tail

The top 10 Opus 4.7 repos by size:

Repo	LOC
BOLDPreciousMetals/bold-ops-dashboard	5,085,152
EndUser123/why	2,708,261
Halildeu/platform-ssot	2,467,407
Eaglemamba/SterileGMP-Knowledge-Hub	2,087,676
brianonbased-dev/HoloScript	1,830,422
elizaOS/eliza	1,769,862
ether/ether.github.com	1,299,135
lee101/stock-prediction	1,145,380
OleBB/wave_project	1,012,175
hbfs-cloud/articles	1,003,688

Every one of the top 10 is over a million lines of code. elizaOS/eliza is particularly notable — it’s a well-known open-source agent framework that also uses Opus 4.7 in its development loop.

Scaffolds are the minority

The “throwaway demo” category — under 500 LOC — accounts for just 6% of the corpus. That’s the opposite of what you’d expect from AI-generated code. It suggests:

Opus 4.7 is used for sustained work, not just one-shot generation
Agent loops expand codebases over time — CLAUDE.md persistence lets the model pick up where it left off
“Build me X” scaffolds grow once someone hits “run” and the app works

Implication for training data

If you’re pulling training data from this corpus, don’t over-weight the small-size bucket. Opus 4.7’s actual output distribution is centered on the 2K-50K LOC band (68% of repos) — and that’s where the most production-representative patterns live.

A median Opus 4.7 repo is 7,722 lines. That’s the size of a small real product. Train for that.

The 45% Rule: Opus 4.7 Builds Production-Scale Code

Size distribution across 7,810 analyzed repos

What does 50,000+ LOC look like?

The long tail

Scaffolds are the minority

Implication for training data

Share this research

Data Privacy Disclaimer

Explore the evidence

Related Research

Button, Card, Badge: The Four Horsemen of shadcn/ui

84% of Opus 4.7 Python Functions Return Types

Rust Files Are the Biggest. Java Files Are the Smallest.

52 Repos With .env Committed: A Security Audit