Count the distinct source languages in every Opus 4.7 repo. The distribution is telling.
Languages per repo
| Source languages in repo | Repos | Share |
|---|---|---|
| 1 | 2,378 | 30% |
| 2 | 2,213 | 28% |
| 3 | 1,262 | 16% |
| 4 | 586 | 7% |
| 5 | 278 | 4% |
| 6+ | 293 | 4% |
| Outliers (10-29) | 14 | <1% |
Only 30% of Opus 4.7 repos are single-language. The median repo spans 2 languages, the p75 spans 3, and a meaningful long tail goes to 5+.
For comparison, the generic GitHub corpus is roughly 45% single-language. Opus 4.7 is ~15 percentage points more polyglot.
Why so polyglot?
Four structural reasons drive the multi-language pattern:
1. Full-stack by default
A Next.js repo already has TypeScript + CSS + HTML. Add a Python data-processing script and it’s four languages. The apps/web + apps/api monorepo pattern is inherently bilingual.
2. Config-heavy ecosystem
yaml(13,204 files) — GitHub Actions, docker-compose, CI configsjson(42,583 files) — package.json, tsconfig, eslinttoml(183 files) — pyproject.toml, Cargo.toml- Each of these is typed as its own language
3. Docs as first-class content
110,506 markdown files across the corpus. Every repo has docs. Every doc is a “language” in the file-typer.
4. Polyglot monorepos
454 Opus 4.7 repos are monorepos. A typical one contains:
- TypeScript (web + api + ui + types)
- Python (scripts + data pipeline)
- SQL (migrations)
- YAML (CI + compose)
- Markdown (docs)
- Shell (dev scripts)
That’s 6 languages before writing a line of Go or Rust.
The outliers
14 repos span 10+ source languages each. One spans 29 distinct languages. These are typically:
- Multi-language SDK projects (with code examples in every target language)
- Documentation sites showcasing code in multiple tongues
- Polyglot experimentation repos
- Large monorepos from elizaOS-style agent frameworks
What’s typically in a 3-language Opus 4.7 repo
The median setup:
- TypeScript — the primary source language (all components, routes, API handlers)
- Python — a secondary scripting language (data processing, one-off migrations, ML preprocessing)
- SQL — Prisma migrations or Drizzle schema
Or if it’s a straight web app:
- TypeScript, JSON, Markdown
What’s in a 5-language Opus 4.7 repo
Typical 5-language mix:
- TypeScript (frontend + backend)
- Python (worker / data)
- SQL (migrations)
- YAML (CI)
- Markdown (docs)
Training implications
If you’re building a model that should match Opus 4.7’s distribution:
- Don’t train only on single-language files. The majority of real Opus 4.7 output spans contexts.
- TypeScript + Python cross-language reasoning is important — particularly when the frontend sends a request to a Python API.
- Config files matter — a model that generates
package.jsonandpyproject.tomlconsistently is valuable because Opus 4.7 does that routinely.
Counter-intuition
You might expect “AI-generated code is simpler, therefore more monolingual.” The opposite is true. Opus 4.7 reaches for the right tool — TypeScript for UI, Python for data pipelines, SQL for schema — more readily than an average developer who stays in their home language.
Related: Opus 4.7 Language Mix: TypeScript + Python + the Long Tail.