The Golden Set: Highest-Quality Opus 4.7 Repos by Archetype

We run quality scoring on every Opus 4.7 repo that has a completed analysis. The top picks per archetype give the clearest picture of what Opus 4.7 does at its best.

Top overall (by composite quality score)

Repo Score Language Type
Lambda-Biolab/dpette-usb-driver 86.7 Python driver
ethanarnold/screenase 85.3 Python CLI
chris2ao/unifi-mcp 84.5 Python MCP server
mrcoggsworth/LogPose 84.0 Python containerized
astroicers/CyPulse 82.2 Python CLI
dynamik-dev/bully 81.9 Python
knorq-ai/xlsx-mcp-server 81.2 JSON+Python MCP server
luthen-seas/nostr-mail-ts 81.1 TypeScript
DIG-Network/dig-epoch 80.9 Markdown
jaman/ex_v_ex 80.7 Elixir

The top 10 skew heavily Python + CLI / MCP server. TypeScript does fine at scale but rarely reaches the 80+ tier. The elephant in the room: MCP servers cluster disproportionately at the top — two of the top ten are literally MCP servers, and the pattern repeats deeper in the list.

Top MCP servers specifically

When we filter for repos named *-mcp* or organized like MCP servers:

  • chris2ao/unifi-mcp — UniFi controller MCP bridge, quality 84.5
  • knorq-ai/xlsx-mcp-server — Excel/XLSX MCP exposer, quality 81.2
  • 90+ others with mcp-* naming

The MCP-server archetype wins on quality because:
1. Scoped contract: MCP’s JSON-RPC over stdio is a rigid interface, which forces clean code
2. Small: most MCP servers are under 2K lines
3. Template bias: Opus 4.7 seems to have a strong internal template for “how an MCP server looks” and reproduces it consistently

If you’re picking training exemplars from this corpus, MCP servers are the tightest, most uniform subset.

Top CLI tools

Command-line utilities also cluster at the top:

  • ethanarnold/screenase — Python, 85.3
  • astroicers/CyPulse — Python, 82.2
  • A few dozen more in the 70s

Like MCPs, CLIs benefit from small scope + a hard interface (argparse → stdout). Opus 4.7 handles argparse idiomatically and catches the common pitfalls (help text, exit codes).

Top web apps

Web apps in the corpus are much more numerous (928 repos!) but score lower on average:

  • Quality concentrates in the 60–72 band
  • The top web app barely cracks 75
  • Many are scaffolds rather than finished products

This isn’t Opus 4.7’s fault so much as a property of the archetype: web apps are big, multi-layered, and the grader can always find something missing. A quality score of 70 on a large web app is better than 82 on a 500-line CLI.

Top monorepos (by size)

Monorepos are our largest artifacts:

  • elizaOS/eliza — 1.8M LOC, TypeScript, the well-known agent framework
  • BOLDPreciousMetals-Master/bold-ops-dashboard — 5.1M LOC, Python ops dashboard
  • EndUser123/why — 2.7M LOC, Python
  • Halildeu/platform-ssot — 2.5M LOC
  • brianonbased-dev/HoloScript — 1.8M LOC, TypeScript

These aren’t “highest quality” by score — they’re biggest. Useful as exemplars of Opus 4.7 at scale.

The practical use of a golden set

If you’re building:

  • Training data: start with overall_top_50 as positive examples
  • Reference docs: the MCP servers above are clean illustrations of idiomatic MCP code
  • Demo projects: the CLIs are easy to show running
  • Stack comparisons: elizaOS/eliza is a canonical TS agent framework
  • Anti-pattern hunts: compare to the bottom 10 (not listed here yet, but coming)

Auto-refresh

The golden set is regenerated every 30 minutes. Check the live snapshot JSON (admin-only) for the current picks.


Scores come from the quality_latest materialized view in our Postgres instance. The view rolls up structure_score, code_quality_score, documentation_score, testing_score, practices_score, security_score, and dependency_score into a single overall number.