The Golden Set: Highest-Quality Opus 4.7 Repos by Archetype
We run quality scoring on every Opus 4.7 repo that has a completed analysis. The top picks per archetype give the clearest picture of what Opus 4.7 does at its best.
Top overall (by composite quality score)
| Repo | Score | Language | Type |
|---|---|---|---|
Lambda-Biolab/dpette-usb-driver |
86.7 | Python | driver |
ethanarnold/screenase |
85.3 | Python | CLI |
chris2ao/unifi-mcp |
84.5 | Python | MCP server |
mrcoggsworth/LogPose |
84.0 | Python | containerized |
astroicers/CyPulse |
82.2 | Python | CLI |
dynamik-dev/bully |
81.9 | Python | — |
knorq-ai/xlsx-mcp-server |
81.2 | JSON+Python | MCP server |
luthen-seas/nostr-mail-ts |
81.1 | TypeScript | — |
DIG-Network/dig-epoch |
80.9 | Markdown | — |
jaman/ex_v_ex |
80.7 | Elixir | — |
The top 10 skew heavily Python + CLI / MCP server. TypeScript does fine at scale but rarely reaches the 80+ tier. The elephant in the room: MCP servers cluster disproportionately at the top — two of the top ten are literally MCP servers, and the pattern repeats deeper in the list.
Top MCP servers specifically
When we filter for repos named *-mcp* or organized like MCP servers:
chris2ao/unifi-mcp— UniFi controller MCP bridge, quality 84.5knorq-ai/xlsx-mcp-server— Excel/XLSX MCP exposer, quality 81.2- 90+ others with
mcp-*naming
The MCP-server archetype wins on quality because:
1. Scoped contract: MCP’s JSON-RPC over stdio is a rigid interface, which forces clean code
2. Small: most MCP servers are under 2K lines
3. Template bias: Opus 4.7 seems to have a strong internal template for “how an MCP server looks” and reproduces it consistently
If you’re picking training exemplars from this corpus, MCP servers are the tightest, most uniform subset.
Top CLI tools
Command-line utilities also cluster at the top:
ethanarnold/screenase— Python, 85.3astroicers/CyPulse— Python, 82.2- A few dozen more in the 70s
Like MCPs, CLIs benefit from small scope + a hard interface (argparse → stdout). Opus 4.7 handles argparse idiomatically and catches the common pitfalls (help text, exit codes).
Top web apps
Web apps in the corpus are much more numerous (928 repos!) but score lower on average:
- Quality concentrates in the 60–72 band
- The top web app barely cracks 75
- Many are scaffolds rather than finished products
This isn’t Opus 4.7’s fault so much as a property of the archetype: web apps are big, multi-layered, and the grader can always find something missing. A quality score of 70 on a large web app is better than 82 on a 500-line CLI.
Top monorepos (by size)
Monorepos are our largest artifacts:
elizaOS/eliza— 1.8M LOC, TypeScript, the well-known agent frameworkBOLDPreciousMetals-Master/bold-ops-dashboard— 5.1M LOC, Python ops dashboardEndUser123/why— 2.7M LOC, PythonHalildeu/platform-ssot— 2.5M LOCbrianonbased-dev/HoloScript— 1.8M LOC, TypeScript
These aren’t “highest quality” by score — they’re biggest. Useful as exemplars of Opus 4.7 at scale.
The practical use of a golden set
If you’re building:
- Training data: start with
overall_top_50as positive examples - Reference docs: the MCP servers above are clean illustrations of idiomatic MCP code
- Demo projects: the CLIs are easy to show running
- Stack comparisons:
elizaOS/elizais a canonical TS agent framework - Anti-pattern hunts: compare to the bottom 10 (not listed here yet, but coming)
Auto-refresh
The golden set is regenerated every 30 minutes. Check the live snapshot JSON (admin-only) for the current picks.
Scores come from the quality_latest materialized view in our Postgres instance. The view rolls up structure_score, code_quality_score, documentation_score, testing_score, practices_score, security_score, and dependency_score into a single overall number.