Of 12,095 Opus 4.7 repos, only 2,302 (19%) contain an explicit LICENSE file.
That means 81% have no license. By default, no license = “all rights reserved” under US copyright law. Code without a license cannot be legally reused, forked, or redistributed.
The contradiction
The same model that writes 66-line READMEs doesn’t write a 21-line MIT license file. The contradiction is sharp:
| Document | Share of Opus 4.7 repos that have it |
|---|---|
| README.md | 85% |
| .gitignore | 54% |
| package.json | 39% |
| tsconfig.json | 27% |
| CLAUDE.md | 18% |
| LICENSE | 19% |
CLAUDE.md (an agent-internal prompt file) appears in roughly the same share of repos as LICENSE (a standard open-source marker). The model treats its own operational context as more important than legal hygiene.
Why it happens
Three hypotheses:
1. Licenses aren’t in “build me X” prompts
When someone asks “build me a task manager,” they don’t say “and add a LICENSE.” The feature list doesn’t include legal. Opus 4.7 writes what’s asked.
2. Licensing needs a choice
Unlike README or tsconfig, a LICENSE file has a decision embedded: MIT vs Apache vs GPL vs proprietary. Making that choice requires understanding:
- Is this repo open-source or proprietary?
- Does the org have a default?
- Does any dependency require a specific license?
An AI without context on the user’s intent defaults to not making the choice — which leaves the file absent.
3. GitHub’s template prompt is one step beyond
When you create a repo in the GitHub UI, it asks “add a LICENSE?” If you create the repo via git init locally (which most Claude-assisted workflows do), that prompt never fires.
The consequences
Legal
81% of the corpus is unusable for third-party reuse without permission. Fork risk. Training-data exclusion risk. Compliance headaches.
Ecosystem
When Opus 4.7 output becomes a dependency (npm package, PyPI module, etc.), unlicensed code can break downstream builds or compliance scans.
Training data
For downstream model training, license-free code is a gray zone. It depends on jurisdiction and model-training doctrine, but the safest training corpora exclude unlicensed repos. That means 81% of this corpus is low-trust data for fine-tuning.
The fix is cheap
A post-generation step should:
- Detect a new repo (no existing LICENSE, has package.json or pyproject.toml)
- Prompt the user once: “MIT, Apache-2.0, or keep it private?”
- Write the canonical LICENSE text
This could be a Claude Code plugin or a CLAUDE.md convention. It would shift the distribution from 19% to ~90% within a few weeks.
Licensing breakdown of the 19% that DO have one
We didn’t fully parse license types, but looking at filenames:
LICENSE(plain) — vast majorityLICENSE.md— a significant subsetUNLICENSE— rare but visibleCOPYING— a handful (GPL convention)
Without parsing contents, we can’t break out MIT vs Apache vs GPL specifically — that’s a future analysis step.
Recommendation
If you’re building an AI coder product:
Make license selection a first-class step in the project scaffold. Don’t rely on the user to remember. Default to MIT, let them override.
This single change would make the corpus more reusable, more training-friendly, and more legally sound.
Separate: 19% of Opus 4.7 repos have a CLAUDE.md. See The CLAUDE.md Phenomenon.