Look at a random AI-generated repo and you expect a toy — a 300-line scaffold, a single page, a quick demo. The Opus 4.7 corpus defies that expectation.

Size distribution across 7,810 analyzed repos

Size bucket Count Share
Scaffold/demo (<500 LOC) 483 6%
Small (500-2K LOC) 1,095 14%
Medium (2K-10K LOC) 2,748 35%
Large (10K-50K LOC) 2,570 33%
Massive (>50K LOC) 928 12%

45% of Opus 4.7 repos exceed 10,000 lines of code. 12% — nearly 1,000 repos — are over 50,000 lines. That’s genuine enterprise scale.

What does 50,000+ LOC look like?

Pick a random 50K-LOC Opus 4.7 repo and you find:

  • Full-stack monorepo (apps/ + packages/)
  • 10+ pnpm workspace packages
  • Both frontend (Next.js) and backend (FastAPI or Next.js API routes)
  • A dedicated db package with Prisma or Drizzle schema
  • An auth package wrapping NextAuth or Better-Auth
  • A ui package with shadcn-ified components
  • Hundreds of React components
  • Multi-stage Docker setup with docker-compose.yml
  • Documentation split across README + CLAUDE.md + API docs

This is architecture, not autocomplete. Someone — either Opus 4.7 driven by a skilled prompter, or Opus 4.7 operating in an agent loop — made real decisions about module boundaries, packaging, and dependency layering.

The long tail

The top 10 Opus 4.7 repos by size:

Repo LOC
BOLDPreciousMetals/bold-ops-dashboard 5,085,152
EndUser123/why 2,708,261
Halildeu/platform-ssot 2,467,407
Eaglemamba/SterileGMP-Knowledge-Hub 2,087,676
brianonbased-dev/HoloScript 1,830,422
elizaOS/eliza 1,769,862
ether/ether.github.com 1,299,135
lee101/stock-prediction 1,145,380
OleBB/wave_project 1,012,175
hbfs-cloud/articles 1,003,688

Every one of the top 10 is over a million lines of code. elizaOS/eliza is particularly notable — it’s a well-known open-source agent framework that also uses Opus 4.7 in its development loop.

Scaffolds are the minority

The “throwaway demo” category — under 500 LOC — accounts for just 6% of the corpus. That’s the opposite of what you’d expect from AI-generated code. It suggests:

  1. Opus 4.7 is used for sustained work, not just one-shot generation
  2. Agent loops expand codebases over time — CLAUDE.md persistence lets the model pick up where it left off
  3. “Build me X” scaffolds grow once someone hits “run” and the app works

Implication for training data

If you’re pulling training data from this corpus, don’t over-weight the small-size bucket. Opus 4.7’s actual output distribution is centered on the 2K-50K LOC band (68% of repos) — and that’s where the most production-representative patterns live.

A median Opus 4.7 repo is 7,722 lines. That’s the size of a small real product. Train for that.


See also: The Top 30 Libraries Claude Opus 4.7 Actually Imports.