The Licensing Gap: 81% of Opus 4.7 Repos Have No LICENSE

Of 12,095 Opus 4.7 repos, only 2,302 (19%) contain an explicit LICENSE file.

The contradiction

The same model that writes 66-line READMEs doesn’t write a 21-line MIT license file. The contradiction is sharp:

Document	Share of Opus 4.7 repos that have it
README.md	85%
.gitignore	54%
package.json	39%
tsconfig.json	27%
CLAUDE.md	18%
LICENSE	19%

CLAUDE.md (an agent-internal prompt file) appears in roughly the same share of repos as LICENSE (a standard open-source marker). The model treats its own operational context as more important than legal hygiene.

Why it happens

Three hypotheses:

1. Licenses aren’t in “build me X” prompts

When someone asks “build me a task manager,” they don’t say “and add a LICENSE.” The feature list doesn’t include legal. Opus 4.7 writes what’s asked.

2. Licensing needs a choice

Unlike README or tsconfig, a LICENSE file has a decision embedded: MIT vs Apache vs GPL vs proprietary. Making that choice requires understanding:
- Is this repo open-source or proprietary?
- Does the org have a default?
- Does any dependency require a specific license?

An AI without context on the user’s intent defaults to not making the choice — which leaves the file absent.

3. GitHub’s template prompt is one step beyond

When you create a repo in the GitHub UI, it asks “add a LICENSE?” If you create the repo via git init locally (which most Claude-assisted workflows do), that prompt never fires.

The consequences

Legal

81% of the corpus is unusable for third-party reuse without permission. Fork risk. Training-data exclusion risk. Compliance headaches.

Ecosystem

When Opus 4.7 output becomes a dependency (npm package, PyPI module, etc.), unlicensed code can break downstream builds or compliance scans.

Training data

For downstream model training, license-free code is a gray zone. It depends on jurisdiction and model-training doctrine, but the safest training corpora exclude unlicensed repos. That means 81% of this corpus is low-trust data for fine-tuning.

The fix is cheap

A post-generation step should:

Detect a new repo (no existing LICENSE, has package.json or pyproject.toml)
Prompt the user once: “MIT, Apache-2.0, or keep it private?”
Write the canonical LICENSE text

This could be a Claude Code plugin or a CLAUDE.md convention. It would shift the distribution from 19% to ~90% within a few weeks.

Licensing breakdown of the 19% that DO have one

We didn’t fully parse license types, but looking at filenames:

LICENSE (plain) — vast majority
LICENSE.md — a significant subset
UNLICENSE — rare but visible
COPYING — a handful (GPL convention)

Without parsing contents, we can’t break out MIT vs Apache vs GPL specifically — that’s a future analysis step.

Recommendation

If you’re building an AI coder product:

Make license selection a first-class step in the project scaffold. Don’t rely on the user to remember. Default to MIT, let them override.

This single change would make the corpus more reusable, more training-friendly, and more legally sound.

Separate: 19% of Opus 4.7 repos have a CLAUDE.md. See The CLAUDE.md Phenomenon.

The Licensing Gap: 81% of Opus 4.7 Repos Have No LICENSE

The contradiction

Why it happens

1. Licenses aren’t in “build me X” prompts

2. Licensing needs a choice

3. GitHub’s template prompt is one step beyond

The consequences

Legal

Ecosystem

Training data

The fix is cheap

Licensing breakdown of the 19% that DO have one

Recommendation

Share this research

Data Privacy Disclaimer

Want the full dataset?

Related Research

Button, Card, Badge: The Four Horsemen of shadcn/ui

84% of Opus 4.7 Python Functions Return Types

Rust Files Are the Biggest. Java Files Are the Smallest.

52 Repos With .env Committed: A Security Audit