Count the distinct source languages in every Opus 4.7 repo. The distribution is telling.

Languages per repo

Source languages in repo Repos Share
1 2,378 30%
2 2,213 28%
3 1,262 16%
4 586 7%
5 278 4%
6+ 293 4%
Outliers (10-29) 14 <1%

Only 30% of Opus 4.7 repos are single-language. The median repo spans 2 languages, the p75 spans 3, and a meaningful long tail goes to 5+.

For comparison, the generic GitHub corpus is roughly 45% single-language. Opus 4.7 is ~15 percentage points more polyglot.

Why so polyglot?

Four structural reasons drive the multi-language pattern:

1. Full-stack by default

A Next.js repo already has TypeScript + CSS + HTML. Add a Python data-processing script and it’s four languages. The apps/web + apps/api monorepo pattern is inherently bilingual.

2. Config-heavy ecosystem

  • yaml (13,204 files) — GitHub Actions, docker-compose, CI configs
  • json (42,583 files) — package.json, tsconfig, eslint
  • toml (183 files) — pyproject.toml, Cargo.toml
  • Each of these is typed as its own language

3. Docs as first-class content

110,506 markdown files across the corpus. Every repo has docs. Every doc is a “language” in the file-typer.

4. Polyglot monorepos

454 Opus 4.7 repos are monorepos. A typical one contains:
- TypeScript (web + api + ui + types)
- Python (scripts + data pipeline)
- SQL (migrations)
- YAML (CI + compose)
- Markdown (docs)
- Shell (dev scripts)

That’s 6 languages before writing a line of Go or Rust.

The outliers

14 repos span 10+ source languages each. One spans 29 distinct languages. These are typically:

  • Multi-language SDK projects (with code examples in every target language)
  • Documentation sites showcasing code in multiple tongues
  • Polyglot experimentation repos
  • Large monorepos from elizaOS-style agent frameworks

What’s typically in a 3-language Opus 4.7 repo

The median setup:

  • TypeScript — the primary source language (all components, routes, API handlers)
  • Python — a secondary scripting language (data processing, one-off migrations, ML preprocessing)
  • SQL — Prisma migrations or Drizzle schema

Or if it’s a straight web app:
- TypeScript, JSON, Markdown

What’s in a 5-language Opus 4.7 repo

Typical 5-language mix:
- TypeScript (frontend + backend)
- Python (worker / data)
- SQL (migrations)
- YAML (CI)
- Markdown (docs)

Training implications

If you’re building a model that should match Opus 4.7’s distribution:

  1. Don’t train only on single-language files. The majority of real Opus 4.7 output spans contexts.
  2. TypeScript + Python cross-language reasoning is important — particularly when the frontend sends a request to a Python API.
  3. Config files matter — a model that generates package.json and pyproject.toml consistently is valuable because Opus 4.7 does that routinely.

Counter-intuition

You might expect “AI-generated code is simpler, therefore more monolingual.” The opposite is true. Opus 4.7 reaches for the right tool — TypeScript for UI, Python for data pipelines, SQL for schema — more readily than an average developer who stays in their home language.


Related: Opus 4.7 Language Mix: TypeScript + Python + the Long Tail.