Opus 4.7 Language Mix
One way to describe a coding model is to show its language fingerprint. Here’s Claude Opus 4.7’s.
Primary language (by repo)
| Language | Repos | Share of analyzed |
|---|---|---|
| TypeScript | 562 | 15% |
| JSON | 783 | 21% (mostly config) |
| Python | 533 | 14% |
| Markdown | 526 | 14% (static sites) |
| HTML | 370 | 10% |
| JavaScript | 232 | 6% |
| YAML | 126 | 3% |
| CSS | 96 | 3% |
| Rust | 80 | 2% |
| Go | 72 | 2% |
| Swift | 53 | 1% |
| C# | 41 | 1% |
| Shell | 32 | 1% |
| Dart | 27 | 1% |
| C | 25 | 1% |
(Primary language is the language with most source-role files in the repo.)
File extension distribution (raw file count)
When we count files across all Opus 4.7 repos:
| Ext | Files |
|---|---|
| .md | 110,506 |
| .ts | 102,472 |
| .tsx | 68,708 |
| .py | 68,652 |
| .json | 42,583 |
| .js | 20,545 |
| .txt | 16,732 |
| .html | 15,875 |
| .java | 13,241 |
| .yaml | 13,204 |
| .go | 12,858 |
| .svg | 12,335 |
| (no ext) | 11,853 |
| .yml | 10,453 |
| .rs | 10,110 |
Markdown files (110K) outnumber source files (102K TypeScript, 69K TSX, 69K Python). That’s the most striking number in the corpus: Opus 4.7 writes more markdown than any single source language.
What the markdown contains
Markdown files by typical role:
- README.md (7,905 repos)
- CLAUDE.md (1,665 files, in 1,324 repos)
- AGENTS.md (813)
- TODO.md (97)
- Changelog-style notes
- Per-component docs in apps with a docs folder
- Blog content in static sites
So the 110K markdown files aren’t dead weight — they’re a mix of project docs (mostly READMEs), agent self-prompts (CLAUDE.md), and content for static sites.
TypeScript dominance in source
Between .ts (102K) and .tsx (69K), TypeScript accounts for 171K source files. Python is next at 69K. Everything else drops off sharply:
- Java is higher than expected (13K) — pockets of Spring Boot / Android appear
- Go (13K) and Rust (10K) are respectable minorities
- Swift (not separately counted here but 53 primary-language repos) exists but is small
Unusual languages we did find
- Elixir —
jaman/ex_v_excracked the top 10 quality list - V (vlang) — 57 files
- Zig — 139 files
- Haskell — 47 files
- Scala — 32 files
- Erlang — 23 files
These are statistically insignificant but show the model can produce working code in exotic languages when asked.
Opus 4.7 vs community baseline
In the matched analyzed subset (~4K Opus 4.7 repos vs ~142K community):
| Metric | Opus 4.7 | Community |
|---|---|---|
| Avg LOC per repo | 32,713 | 27,569 |
| Avg files per repo | 184 | 144 |
Opus 4.7 repos are ~18% bigger than community average on both LOC and file count. This reflects the monorepo tendency noted in earlier post — more packages per repo means more files.
Takeaway for model operators
If you’re:
- Choosing data for fine-tuning: TypeScript + Python is 70%+ of meaningful content; don’t over-sample the long tail
- Picking benchmark languages: TS + Python + Go cover 85% of what the model sees in training
- Wondering about translation / transpilation: the model has adequate grounding for TS ↔ Python, weaker for TS ↔ Rust
Updated every 30 min as new repos come in.