Opus 4.7 Language Mix

One way to describe a coding model is to show its language fingerprint. Here’s Claude Opus 4.7’s.

Primary language (by repo)

Language Repos Share of analyzed
TypeScript 562 15%
JSON 783 21% (mostly config)
Python 533 14%
Markdown 526 14% (static sites)
HTML 370 10%
JavaScript 232 6%
YAML 126 3%
CSS 96 3%
Rust 80 2%
Go 72 2%
Swift 53 1%
C# 41 1%
Shell 32 1%
Dart 27 1%
C 25 1%

(Primary language is the language with most source-role files in the repo.)

File extension distribution (raw file count)

When we count files across all Opus 4.7 repos:

Ext Files
.md 110,506
.ts 102,472
.tsx 68,708
.py 68,652
.json 42,583
.js 20,545
.txt 16,732
.html 15,875
.java 13,241
.yaml 13,204
.go 12,858
.svg 12,335
(no ext) 11,853
.yml 10,453
.rs 10,110

Markdown files (110K) outnumber source files (102K TypeScript, 69K TSX, 69K Python). That’s the most striking number in the corpus: Opus 4.7 writes more markdown than any single source language.

What the markdown contains

Markdown files by typical role:
- README.md (7,905 repos)
- CLAUDE.md (1,665 files, in 1,324 repos)
- AGENTS.md (813)
- TODO.md (97)
- Changelog-style notes
- Per-component docs in apps with a docs folder
- Blog content in static sites

So the 110K markdown files aren’t dead weight — they’re a mix of project docs (mostly READMEs), agent self-prompts (CLAUDE.md), and content for static sites.

TypeScript dominance in source

Between .ts (102K) and .tsx (69K), TypeScript accounts for 171K source files. Python is next at 69K. Everything else drops off sharply:

  • Java is higher than expected (13K) — pockets of Spring Boot / Android appear
  • Go (13K) and Rust (10K) are respectable minorities
  • Swift (not separately counted here but 53 primary-language repos) exists but is small

Unusual languages we did find

  • Elixirjaman/ex_v_ex cracked the top 10 quality list
  • V (vlang) — 57 files
  • Zig — 139 files
  • Haskell — 47 files
  • Scala — 32 files
  • Erlang — 23 files

These are statistically insignificant but show the model can produce working code in exotic languages when asked.

Opus 4.7 vs community baseline

In the matched analyzed subset (~4K Opus 4.7 repos vs ~142K community):

Metric Opus 4.7 Community
Avg LOC per repo 32,713 27,569
Avg files per repo 184 144

Opus 4.7 repos are ~18% bigger than community average on both LOC and file count. This reflects the monorepo tendency noted in earlier post — more packages per repo means more files.

Takeaway for model operators

If you’re:
- Choosing data for fine-tuning: TypeScript + Python is 70%+ of meaningful content; don’t over-sample the long tail
- Picking benchmark languages: TS + Python + Go cover 85% of what the model sees in training
- Wondering about translation / transpilation: the model has adequate grounding for TS ↔ Python, weaker for TS ↔ Rust


Updated every 30 min as new repos come in.