Of 12,095 Opus 4.7 repos, only 2,302 (19%) contain an explicit LICENSE file.

That means 81% have no license. By default, no license = “all rights reserved” under US copyright law. Code without a license cannot be legally reused, forked, or redistributed.

The contradiction

The same model that writes 66-line READMEs doesn’t write a 21-line MIT license file. The contradiction is sharp:

Document Share of Opus 4.7 repos that have it
README.md 85%
.gitignore 54%
package.json 39%
tsconfig.json 27%
CLAUDE.md 18%
LICENSE 19%

CLAUDE.md (an agent-internal prompt file) appears in roughly the same share of repos as LICENSE (a standard open-source marker). The model treats its own operational context as more important than legal hygiene.

Why it happens

Three hypotheses:

1. Licenses aren’t in “build me X” prompts

When someone asks “build me a task manager,” they don’t say “and add a LICENSE.” The feature list doesn’t include legal. Opus 4.7 writes what’s asked.

2. Licensing needs a choice

Unlike README or tsconfig, a LICENSE file has a decision embedded: MIT vs Apache vs GPL vs proprietary. Making that choice requires understanding:
- Is this repo open-source or proprietary?
- Does the org have a default?
- Does any dependency require a specific license?

An AI without context on the user’s intent defaults to not making the choice — which leaves the file absent.

3. GitHub’s template prompt is one step beyond

When you create a repo in the GitHub UI, it asks “add a LICENSE?” If you create the repo via git init locally (which most Claude-assisted workflows do), that prompt never fires.

The consequences

81% of the corpus is unusable for third-party reuse without permission. Fork risk. Training-data exclusion risk. Compliance headaches.

Ecosystem

When Opus 4.7 output becomes a dependency (npm package, PyPI module, etc.), unlicensed code can break downstream builds or compliance scans.

Training data

For downstream model training, license-free code is a gray zone. It depends on jurisdiction and model-training doctrine, but the safest training corpora exclude unlicensed repos. That means 81% of this corpus is low-trust data for fine-tuning.

The fix is cheap

A post-generation step should:

  1. Detect a new repo (no existing LICENSE, has package.json or pyproject.toml)
  2. Prompt the user once: “MIT, Apache-2.0, or keep it private?”
  3. Write the canonical LICENSE text

This could be a Claude Code plugin or a CLAUDE.md convention. It would shift the distribution from 19% to ~90% within a few weeks.

Licensing breakdown of the 19% that DO have one

We didn’t fully parse license types, but looking at filenames:

  • LICENSE (plain) — vast majority
  • LICENSE.md — a significant subset
  • UNLICENSE — rare but visible
  • COPYING — a handful (GPL convention)

Without parsing contents, we can’t break out MIT vs Apache vs GPL specifically — that’s a future analysis step.

Recommendation

If you’re building an AI coder product:

Make license selection a first-class step in the project scaffold. Don’t rely on the user to remember. Default to MIT, let them override.

This single change would make the corpus more reusable, more training-friendly, and more legally sound.


Separate: 19% of Opus 4.7 repos have a CLAUDE.md. See The CLAUDE.md Phenomenon.