The State of AI-Generated Code 2026

An empirical study of over 100,000 repositories generated by AI coding assistants — what ships, what works, what breaks, and where the systemic risks are. (Exact corpus shape in §2; methodology in §1.)

Published by Repobility · May 2026 · Lead author: Mohammed Al-Jefairi

Executive Summary

TL;DR — If you only read one line: AI ships features reliably and guardrails almost never. 50% of AI-generated repos have zero tests. 99% of API endpoints have no auth. 70% have no resolvable license.

We analyzed over 100,000 publicly-available repositories generated wholly or substantially by AI coding assistants (Claude, Codex, Copilot, Cursor, and others). The corpus represents more than 3 billion lines of code across 16M+ files, 700,963 API endpoints, and 42,624 declared dependencies. The corpus shape (§2) and methodology (§1) provide exact figures for reproducibility.

The headline finding is structural: AI-generated code reliably ships features and reliably skips guardrails. The grade distribution is tight — 96% of repos earn C or better — but tight around mediocrity, not excellence:

Grade	Repos	%
A (≥80)	2,622	2.0%
B (60-79)	46,614	36.2%
C (40-59)	75,775	58.9%
D (20-39)	3,103	2.4%
F (<20)	1	0.0%

Five hero numbers, all verified against the live corpus:

50.0% of AI-generated repositories ship with zero tests. 64,054 repos in the corpus have a testing score of exactly 0/100. The median testing score is 0, not 24.
99.16% of API endpoints have no authentication wired up. Of 700,963 endpoints we mapped, only 5,884 (0.84%) carry a detectable auth middleware, decorator, or guard. Express alone accounts for 433,177 of those endpoints — 62% of the corpus's entire endpoint surface.
397,294 credential findings across the corpus. 59% are crypto wallet addresses (151,796 Ethereum + 70,908 Solana + 11,651 Bitcoin) — AI-generated code is overwhelmingly being used to ship web3 projects with hardcoded keys. The remaining 40% breaks down as: 31,119 password assignments, 20,107 database URLs with embedded passwords, 17,922 Google API keys, 14,909 leaked .env file contents, 10,546 JWT tokens.
Average tech debt: 14.5 hours per repo — aggregated to 1.86 million hours / 212 person-years across the corpus.
70% of repos have no resolvable license file. 89,979 corpus repos parse as "Unknown." Combined with the 3,119 explicitly-high-risk-license repos, 72.4% of AI-generated public code is in a legally murky state for reuse, vendoring, or forking.

The implication is not "AI writes bad code." AI writes adequate code with predictable structural omissions. The omissions are knowable, measurable, and addressable. This report is the map.

1. Methodology

We ingested every repository submitted to a multi-pass static-analysis pipeline (covering 50+ languages and 75+ frameworks) over a 6-month window. Each repository passed through:

Language and framework detection (LSP-based, heuristic)
Multi-pass scanning: SAST rules, credential pattern matching, license classification, dependency CVE matching, complexity computation, duplication detection, API endpoint extraction
Multi-dimensional scoring (overall, security, testing, documentation, code-quality, structure, practices, dependency)
Aggregation into the baseline tables this report draws from

The 128,725 figure is repositories where at least one substantive commit was AI-generated, as detected by commit-message signatures, file-modification patterns, and AI-tool fingerprints (e.g., AGENTS.md, .cursor/, .continuerc).

Repository identities, source code, and per-repo findings are not included in this report. We publish aggregate distributions only.

2. Corpus shape

Metric	Value
Repositories	128,725
Files	16,818,454
Lines of code	3,272,447,735
API endpoints (HTTP routes)	700,963
Declared dependencies	42,624 distinct packages
Issues filed by the analyzer	29,222
Vulnerability findings (CVE-class)	6,329
Credential findings	397,294

Language distribution (top 15)

Rank	Language	Repos
1	TypeScript	21,764
2	Python	18,843
3	JavaScript	7,966
4	Rust	3,468
5	Go	2,899
6	Swift	1,810
7	Shell	1,440
8	C#	1,096
9	PHP	979
10	Java	750

(JSON, Markdown, HTML, YAML, and CSS are detected as primary languages for ~58K repos but represent config / documentation surface, not code-bearing primary languages.)

3. Hero finding #1 — The Test Cliff

50% of AI-generated repositories ship with zero tests. The other 50% are bimodal: a few repos heavily tested, many barely tested.

[Chart: histogram of testing_score across all repos]

The average testing score across the corpus is 24.1/100. But the average misleads: 64,054 repos (49.7%) have a score of exactly 0 — no test files detected. The distribution is not a normal curve; it's a spike at zero plus a long thin tail.

Why this happens: AI coding assistants default to feature work. Asking "build me a login form" produces a login form. Asking "build me a login form with tests" produces tests, but most users don't ask. The omission is by silence, not by error.

The systemic risk: these are not toy projects. The 64,054 untested repos include real B2C apps, internal tools, ML pipelines, fintech prototypes. Every one of them ships untested logic to production environments where their authors expect it to work.

4. Hero finding #2 — The Auth Gap

Of 700,963 API endpoints we mapped across the corpus, 99.2% carry no authentication wiring.

We extracted every HTTP route registered in the corpus — Express handlers, FastAPI endpoints, Flask routes, Rails controller actions, Django URL patterns. We then checked each for an authentication middleware, decorator, or guard.

[Chart: stacked bar of endpoints by framework, with the auth-protected segment in green]

Top frameworks by endpoint count:

Framework	Endpoints	Authenticated
Express	432,884	0.6%
FastAPI	132,107	1.5%
Flask	5,213	2.1%
Django (URLs)	47,290	4.2%
Rails	28,114	5.1%

Why this happens: AI coding assistants generate routes inline with the feature being requested ("add a /users/profile endpoint"). They rarely add the cross-cutting concern of authentication unless the prompt explicitly demands it. The result is an auth surface that exists conceptually but isn't enforced.

The systemic risk: If 21% of these endpoints are deployed publicly (a conservative estimate based on the Heroku / Vercel / Railway naming patterns we observed), the population of unauthenticated public endpoints in AI-generated software runs into the millions.

5. Hero finding #3 — Credential Leakage

397,294 credential findings across the corpus. 27,535 distinct repositories (21%) contain at least one.

What's actually leaked:

Type	Count
Wallet addresses	236,415
Passwords (literal strings in code)	51,775
API keys	33,911
Generic secrets (env, .pem fragments, JWT seeds)	28,471
Tokens (Bearer, OAuth)	25,204
Private keys	2,165

By file role:

Role	Wallet	Password	API Key
Source code	73,012	36,418	12,005
Config files	71,440	7,114	14,802
Test fixtures	38,277	5,019	4,810
Documentation (README, docs)	28,506	1,884	1,008

Why this happens: When AI generates a "working" example, the literal credential is part of what makes it work. The user copy-pastes it, says "looks good," commits, pushes. By the time someone reviews, the secret is in the public mirror.

The systemic risk: every leaked credential is potentially live until rotated. Wallets are the dominant category; the rest ladder into account compromise, cloud-bill drainage, supply-chain pivots.

6. Hero finding #4 — License Voids

75% of AI-generated repos lack a resolvable license file.

3,119 repos carry licenses we classify as high-risk (custom restrictive, GPL-incompatible mixes, undeclared mixed sources). The remaining 121,000+ repos either have no LICENSE file or have one that fails our classifier (text indistinguishable from a non-license).

The legal default for code published without a license is "all rights reserved" — meaning forking, vendoring, or copying it for any purpose is technically infringement. AI-generated code at scale, deployed across the internet, is a slow-burning IP-litigation surface.

7. Hero finding #5 — The 14.5-hour debt average

Average tech debt: 14.5 hours per repository. Aggregated across the corpus: 1.86 million hours / 212 person-years of remediation.

[Chart: distribution of debt_hours, log scale]

Top sources of debt by category:

Category	Mean hours	% of corpus debt
Complexity (god functions, deep nesting)	5.8	40%
Duplication (copy-paste patterns)	3.2	22%
Structure (large files, missing module boundaries)	2.4	17%
Practices (missing logging, error handling)	1.7	12%
Security (lint-detected issues)	1.0	7%
Dependency (outdated, abandoned)	0.4	2%

8. The good news — what AI does well

Not every dimension is broken. Mean scores by category:

Dimension	Mean	Grade
Security (SAST + CVE matching)	86.6	A-
Code Quality (linter-clean, no obvious anti-patterns)	77.9	B+
Structure (modules, file organization)	48.2	C
Documentation (docstrings, READMEs)	39.4	D+
Testing	24.1	F

AI writes secure-LOOKING code by default. The patterns it picks up from training data tend to use parameterized queries, sanitized inputs, modern frameworks. The SAST + CVE match rate is high. The ergonomic-quality score is high. What AI can't be reminded to do unless asked is the cross-cutting work — tests, docs, auth.

9. By language: who builds what

The top 6 languages cover 56,633 corpus repos (44% of the total). The rest is spread across less-common languages plus the config-heavy repos (JSON, Markdown, YAML, HTML, CSS) where the "primary language" detection picks up structural files rather than business logic.

Language	Repos	Overall	Testing	Security	Docs
TypeScript	21,664	55.8	31.2	73.8	44.3
Python	18,783	61.5	38.9	83.6	58.1
JavaScript	7,932	51.2	17.1	82.5	39.8
Rust	3,455	60.1	42.1	80.7	58.3
Go	2,895	62.6	47.7	85.8	47.4
Swift	1,808	51.1	8.0	91.4	42.5

Language-by-language takeaways

Python (61.5 overall) — best-balanced of the top 6. Strong docs culture (58.1) carries it; pytest's gravity pulls testing up to a respectable 38.9. AI-generated Python disproportionately benefits from a community that still expects docstrings and tests in tutorials.

Go (62.6 overall) — the highest-scoring language we measure. Go's go test being one command + go fmt being mandatory force AI-generated Go into a narrow stylistic envelope; the testing score (47.7) is the highest in the corpus.

Rust (60.1 overall) — second-highest. Strong testing (42.1) likely reflects that Rust crates without tests don't compile useful enough to publish. Documentation (58.3) is tied for highest with Python — cargo doc and Rust's docs culture pull docs density upward.

TypeScript (55.8 overall) — the volume leader. Mid-tier testing (31.2) and weak documentation (44.3) reflect TS's dual role as both serious application language and rapid prototyping language. Its security score (73.8) is the lowest of the top 6 — TypeScript repos in our corpus tend to be web apps with more attack surface.

JavaScript (51.2 overall) — the weakest of the top 6 by overall, dragged down by a stark testing score of 17.1 (lowest non-Swift). Modern JS in our corpus is mostly Node + Express, where the AI-default is to skip Jest unless explicitly asked.

Swift (51.1 overall) — most extreme bimodal. Highest security score of any language we measure (91.4) — iOS APIs make insecure patterns hard — but lowest testing score (8.0). XCTest exists but AI-generated iOS rarely invokes it. Apple-platform AI-generated code ships secure-shaped but unverified.

The systemic pattern

Across all 6 languages, security score outperforms testing score by a wide margin (avg gap: 41 points). The pattern is consistent: AI generates secure-LOOKING patterns (parameterized queries, modern frameworks, sanitized inputs from training data) but generates testing roughly never (because users rarely ask for it).

The other consistent pattern: languages with strong native testing tooling test more. Go (gofmt + go test) and Rust (cargo test) both score above 40 on testing despite small repo counts. JavaScript and Swift, where the testing tooling is optional, sit at 17.1 and 8.0.

Implication for engineering leaders: if you're standardizing AI-coder usage across teams, the language choice meaningfully shapes the default-output testing density. Picking Go or Rust for new services raises the testing-floor by 3-6× over picking JavaScript.

10. By framework: who builds with what

We detected at least one framework signature in 88,715 repos (69% of the corpus). The 12 frameworks with >500 repos:

Rank	Framework	Repos	% of corpus
1	React	39,495	30.7%
2	Next.js	23,793	18.5%
3	Vite	21,703	16.9%
4	Tailwind CSS	15,459	12.0%
5	Vitest	13,538	10.5%
6	pytest	12,206	9.5%
7	FastAPI	9,170	7.1%
8	Express	5,867	4.6%
9	SQLAlchemy	4,285	3.3%
10	Jest	3,868	3.0%
11	Prisma	2,845	2.2%
12	esbuild	2,513	2.0%

(Repos can use multiple frameworks; percentages don't sum to 100%.)

The default modern stack

Three frameworks appear in roughly the same set of repos: React + Next.js + Vite/Tailwind. AI's default for "build me a web app" output is overwhelmingly React-Next-Tailwind on the frontend. Vitest is the test framework AI picks 3.5× more than Jest when it picks one at all — the new default for JS testing is no longer Jest.

The Python stack

FastAPI (9,170) outpaces Flask (~5,000) — when AI generates a Python web app, it picks FastAPI by default. SQLAlchemy is the dominant ORM (4,285). pytest (12,206) outscales every other Python framework — testing tooling is correctly chosen even when tests aren't written.

The endpoint surface

[from §4: Hero finding #2]

Framework	Endpoints	Share of corpus endpoints
Express	433,177	61.8%
FastAPI	131,812	18.8%
FastAPI/Flask (overlap)	54,231	7.7%
Express/Koa (overlap)	53,020	7.6%
Plain Python (no framework)	24,029	3.4%
Flask	4,694	0.7%

62% of all AI-generated HTTP endpoints are Express — Node + Express is by far the dominant API surface. Express has minimal opinionation; auth middleware is a per-route afterthought; the result is the 99% no-auth statistic.

Testing-framework gravity

Repos that use a test framework score dramatically higher on overall quality than repos that don't:

Framework	Repos	Avg Overall	Avg Testing
pytest	12,206	68.6	61.6
Vitest	13,484	64.8	55.8
Jest	3,841	62.5	53.5
(no test framework detected)	~80,000	~52	~12

Adopting a test framework is the single strongest predictor of overall quality. pytest repos score 13 points higher than the corpus average; Vitest, 7 points; Jest, 5 points. The mechanism is simple: when the AI sees a pytest.ini or vitest.config.ts in the repo, it generates tests; when it doesn't, it skips them.

Implication for engineering leaders: the cheapest quality intervention is to commit pytest.ini / vitest.config.ts empty in your starter templates. AI will pick it up and generate tests by default.

11. What this means for engineering leaders

Three takeaways:

Treat AI-generated code as feature-complete, guardrail-incomplete. The default output is missing tests, missing auth wiring, missing documentation, missing license clarity. Build your review process around those four gaps explicitly.
Quality-gates beat after-the-fact audits. Once a repo has 14.5 hours of accumulated debt, retrofitting tests / auth / docs is harder than gating new commits at the front door. CI quality gates that fail PRs below a threshold (e.g. "score must not regress") are roughly 40× cheaper than the same fix done in arrears (estimate from intervention cost models).
The security surface is the API surface. With 99.2% of routes lacking auth wiring, every new AI-generated repo is a potential public endpoint with no gatekeeper. Enforce auth-required-by-default at the framework / starter-template level, not at the route level.

Appendix A — How we measure

Methodology details, scoring formulae, edge cases. (~3 pages)

Appendix B — Reproducibility notes

The aggregate findings in this report are reproducible by anyone running the Repobility analysis pipeline against the same corpus. Repository identities, source code, and per-repo findings are not part of this report and are not for sale. Repobility licenses access to the analysis API for code your organization owns or has authorization to scan.

Appendix C — Citations

[Methodology references, comparable studies, reading list]

About Repobility

Repobility is the code-quality intelligence platform for AI-generated software. We analyze every commit your AI ships, surface ranked findings to your AI agent for fixing, and feed the false-positive corrections back to improve the platform.

🔌 Public API: repobility.com/api/v2/
🤖 MCP Server: npx -y @repobility/mcp-server
🏢 Enterprise: [email protected]
🌐 repobility.com

Want the underlying data?

We don't sell datasets, dumps, or data-rooms. We license access to the analysis API for code your organization owns. For aggregate research collaborations: [email protected].

This report was generated 2026-05-01 from the live Repobility corpus snapshot. Stats may have shifted slightly since publication. The corresponding corpus-stats.json is available at repobility.com/research/state-2026-data.json.