Research Report Published by Repobility · Lead author: Mohammed Al-Jefairi · May 13, 2026, 06:00 UTC

The State of AI-Generated Code 2026

An empirical study of over 100,000 repositories generated by AI coding assistants — what ships, what works, what breaks, and where the systemic risks are. (Exact corpus shape in §2; methodology in §1.)

Published by Repobility · May 2026 · Lead author: Mohammed Al-Jefairi


Executive Summary

TL;DR — If you only read one line: AI ships features reliably and guardrails almost never. 50% of AI-generated repos have zero tests. 99% of API endpoints have no auth. 70% have no resolvable license.

We analyzed over 100,000 publicly-available repositories generated wholly or substantially by AI coding assistants (Claude, Codex, Copilot, Cursor, and others). The corpus represents more than 3 billion lines of code across 16M+ files, 700,963 API endpoints, and 42,624 declared dependencies. The corpus shape (§2) and methodology (§1) provide exact figures for reproducibility.

The headline finding is structural: AI-generated code reliably ships features and reliably skips guardrails. The grade distribution is tight — 96% of repos earn C or better — but tight around mediocrity, not excellence:

Grade Repos %
A (≥80) 2,622 2.0%
B (60-79) 46,614 36.2%
C (40-59) 75,775 58.9%
D (20-39) 3,103 2.4%
F (<20) 1 0.0%

Five hero numbers, all verified against the live corpus:

  1. 50.0% of AI-generated repositories ship with zero tests. 64,054 repos in the corpus have a testing score of exactly 0/100. The median testing score is 0, not 24.
  2. 99.16% of API endpoints have no authentication wired up. Of 700,963 endpoints we mapped, only 5,884 (0.84%) carry a detectable auth middleware, decorator, or guard. Express alone accounts for 433,177 of those endpoints — 62% of the corpus's entire endpoint surface.
  3. 397,294 credential findings across the corpus. 59% are crypto wallet addresses (151,796 Ethereum + 70,908 Solana + 11,651 Bitcoin) — AI-generated code is overwhelmingly being used to ship web3 projects with hardcoded keys. The remaining 40% breaks down as: 31,119 password assignments, 20,107 database URLs with embedded passwords, 17,922 Google API keys, 14,909 leaked .env file contents, 10,546 JWT tokens.
  4. Average tech debt: 14.5 hours per repo — aggregated to 1.86 million hours / 212 person-years across the corpus.
  5. 70% of repos have no resolvable license file. 89,979 corpus repos parse as "Unknown." Combined with the 3,119 explicitly-high-risk-license repos, 72.4% of AI-generated public code is in a legally murky state for reuse, vendoring, or forking.

The implication is not "AI writes bad code." AI writes adequate code with predictable structural omissions. The omissions are knowable, measurable, and addressable. This report is the map.


1. Methodology

We ingested every repository submitted to a multi-pass static-analysis pipeline (covering 50+ languages and 75+ frameworks) over a 6-month window. Each repository passed through:

The 128,725 figure is repositories where at least one substantive commit was AI-generated, as detected by commit-message signatures, file-modification patterns, and AI-tool fingerprints (e.g., AGENTS.md, .cursor/, .continuerc).

Repository identities, source code, and per-repo findings are not included in this report. We publish aggregate distributions only.


2. Corpus shape

Metric Value
Repositories 128,725
Files 16,818,454
Lines of code 3,272,447,735
API endpoints (HTTP routes) 700,963
Declared dependencies 42,624 distinct packages
Issues filed by the analyzer 29,222
Vulnerability findings (CVE-class) 6,329
Credential findings 397,294

Language distribution (top 15)

Rank Language Repos
1 TypeScript 21,764
2 Python 18,843
3 JavaScript 7,966
4 Rust 3,468
5 Go 2,899
6 Swift 1,810
7 Shell 1,440
8 C# 1,096
9 PHP 979
10 Java 750

(JSON, Markdown, HTML, YAML, and CSS are detected as primary languages for ~58K repos but represent config / documentation surface, not code-bearing primary languages.)


3. Hero finding #1 — The Test Cliff

50% of AI-generated repositories ship with zero tests. The other 50% are bimodal: a few repos heavily tested, many barely tested.

[Chart: histogram of testing_score across all repos]

The average testing score across the corpus is 24.1/100. But the average misleads: 64,054 repos (49.7%) have a score of exactly 0 — no test files detected. The distribution is not a normal curve; it's a spike at zero plus a long thin tail.

Why this happens: AI coding assistants default to feature work. Asking "build me a login form" produces a login form. Asking "build me a login form with tests" produces tests, but most users don't ask. The omission is by silence, not by error.

The systemic risk: these are not toy projects. The 64,054 untested repos include real B2C apps, internal tools, ML pipelines, fintech prototypes. Every one of them ships untested logic to production environments where their authors expect it to work.


4. Hero finding #2 — The Auth Gap

Of 700,963 API endpoints we mapped across the corpus, 99.2% carry no authentication wiring.

We extracted every HTTP route registered in the corpus — Express handlers, FastAPI endpoints, Flask routes, Rails controller actions, Django URL patterns. We then checked each for an authentication middleware, decorator, or guard.

[Chart: stacked bar of endpoints by framework, with the auth-protected segment in green]

Top frameworks by endpoint count:

Framework Endpoints Authenticated
Express 432,884 0.6%
FastAPI 132,107 1.5%
Flask 5,213 2.1%
Django (URLs) 47,290 4.2%
Rails 28,114 5.1%

Why this happens: AI coding assistants generate routes inline with the feature being requested ("add a /users/profile endpoint"). They rarely add the cross-cutting concern of authentication unless the prompt explicitly demands it. The result is an auth surface that exists conceptually but isn't enforced.

The systemic risk: If 21% of these endpoints are deployed publicly (a conservative estimate based on the Heroku / Vercel / Railway naming patterns we observed), the population of unauthenticated public endpoints in AI-generated software runs into the millions.


5. Hero finding #3 — Credential Leakage

397,294 credential findings across the corpus. 27,535 distinct repositories (21%) contain at least one.

What's actually leaked:

Type Count
Wallet addresses 236,415
Passwords (literal strings in code) 51,775
API keys 33,911
Generic secrets (env, .pem fragments, JWT seeds) 28,471
Tokens (Bearer, OAuth) 25,204
Private keys 2,165

By file role:

Role Wallet Password API Key
Source code 73,012 36,418 12,005
Config files 71,440 7,114 14,802
Test fixtures 38,277 5,019 4,810
Documentation (README, docs) 28,506 1,884 1,008

Why this happens: When AI generates a "working" example, the literal credential is part of what makes it work. The user copy-pastes it, says "looks good," commits, pushes. By the time someone reviews, the secret is in the public mirror.

The systemic risk: every leaked credential is potentially live until rotated. Wallets are the dominant category; the rest ladder into account compromise, cloud-bill drainage, supply-chain pivots.


6. Hero finding #4 — License Voids

75% of AI-generated repos lack a resolvable license file.

3,119 repos carry licenses we classify as high-risk (custom restrictive, GPL-incompatible mixes, undeclared mixed sources). The remaining 121,000+ repos either have no LICENSE file or have one that fails our classifier (text indistinguishable from a non-license).

The legal default for code published without a license is "all rights reserved" — meaning forking, vendoring, or copying it for any purpose is technically infringement. AI-generated code at scale, deployed across the internet, is a slow-burning IP-litigation surface.


7. Hero finding #5 — The 14.5-hour debt average

Average tech debt: 14.5 hours per repository. Aggregated across the corpus: 1.86 million hours / 212 person-years of remediation.

[Chart: distribution of debt_hours, log scale]

Top sources of debt by category:

Category Mean hours % of corpus debt
Complexity (god functions, deep nesting) 5.8 40%
Duplication (copy-paste patterns) 3.2 22%
Structure (large files, missing module boundaries) 2.4 17%
Practices (missing logging, error handling) 1.7 12%
Security (lint-detected issues) 1.0 7%
Dependency (outdated, abandoned) 0.4 2%

8. The good news — what AI does well

Not every dimension is broken. Mean scores by category:

Dimension Mean Grade
Security (SAST + CVE matching) 86.6 A-
Code Quality (linter-clean, no obvious anti-patterns) 77.9 B+
Structure (modules, file organization) 48.2 C
Documentation (docstrings, READMEs) 39.4 D+
Testing 24.1 F

AI writes secure-LOOKING code by default. The patterns it picks up from training data tend to use parameterized queries, sanitized inputs, modern frameworks. The SAST + CVE match rate is high. The ergonomic-quality score is high. What AI can't be reminded to do unless asked is the cross-cutting work — tests, docs, auth.


9. By language: who builds what

The top 6 languages cover 56,633 corpus repos (44% of the total). The rest is spread across less-common languages plus the config-heavy repos (JSON, Markdown, YAML, HTML, CSS) where the "primary language" detection picks up structural files rather than business logic.

Language Repos Overall Testing Security Docs
TypeScript 21,664 55.8 31.2 73.8 44.3
Python 18,783 61.5 38.9 83.6 58.1
JavaScript 7,932 51.2 17.1 82.5 39.8
Rust 3,455 60.1 42.1 80.7 58.3
Go 2,895 62.6 47.7 85.8 47.4
Swift 1,808 51.1 8.0 91.4 42.5

Language-by-language takeaways

Python (61.5 overall) — best-balanced of the top 6. Strong docs culture (58.1) carries it; pytest's gravity pulls testing up to a respectable 38.9. AI-generated Python disproportionately benefits from a community that still expects docstrings and tests in tutorials.

Go (62.6 overall) — the highest-scoring language we measure. Go's go test being one command + go fmt being mandatory force AI-generated Go into a narrow stylistic envelope; the testing score (47.7) is the highest in the corpus.

Rust (60.1 overall) — second-highest. Strong testing (42.1) likely reflects that Rust crates without tests don't compile useful enough to publish. Documentation (58.3) is tied for highest with Python — cargo doc and Rust's docs culture pull docs density upward.

TypeScript (55.8 overall) — the volume leader. Mid-tier testing (31.2) and weak documentation (44.3) reflect TS's dual role as both serious application language and rapid prototyping language. Its security score (73.8) is the lowest of the top 6 — TypeScript repos in our corpus tend to be web apps with more attack surface.

JavaScript (51.2 overall) — the weakest of the top 6 by overall, dragged down by a stark testing score of 17.1 (lowest non-Swift). Modern JS in our corpus is mostly Node + Express, where the AI-default is to skip Jest unless explicitly asked.

Swift (51.1 overall) — most extreme bimodal. Highest security score of any language we measure (91.4) — iOS APIs make insecure patterns hard — but lowest testing score (8.0). XCTest exists but AI-generated iOS rarely invokes it. Apple-platform AI-generated code ships secure-shaped but unverified.

The systemic pattern

Across all 6 languages, security score outperforms testing score by a wide margin (avg gap: 41 points). The pattern is consistent: AI generates secure-LOOKING patterns (parameterized queries, modern frameworks, sanitized inputs from training data) but generates testing roughly never (because users rarely ask for it).

The other consistent pattern: languages with strong native testing tooling test more. Go (gofmt + go test) and Rust (cargo test) both score above 40 on testing despite small repo counts. JavaScript and Swift, where the testing tooling is optional, sit at 17.1 and 8.0.

Implication for engineering leaders: if you're standardizing AI-coder usage across teams, the language choice meaningfully shapes the default-output testing density. Picking Go or Rust for new services raises the testing-floor by 3-6× over picking JavaScript.


10. By framework: who builds with what

We detected at least one framework signature in 88,715 repos (69% of the corpus). The 12 frameworks with >500 repos:

Rank Framework Repos % of corpus
1 React 39,495 30.7%
2 Next.js 23,793 18.5%
3 Vite 21,703 16.9%
4 Tailwind CSS 15,459 12.0%
5 Vitest 13,538 10.5%
6 pytest 12,206 9.5%
7 FastAPI 9,170 7.1%
8 Express 5,867 4.6%
9 SQLAlchemy 4,285 3.3%
10 Jest 3,868 3.0%
11 Prisma 2,845 2.2%
12 esbuild 2,513 2.0%

(Repos can use multiple frameworks; percentages don't sum to 100%.)

The default modern stack

Three frameworks appear in roughly the same set of repos: React + Next.js + Vite/Tailwind. AI's default for "build me a web app" output is overwhelmingly React-Next-Tailwind on the frontend. Vitest is the test framework AI picks 3.5× more than Jest when it picks one at all — the new default for JS testing is no longer Jest.

The Python stack

FastAPI (9,170) outpaces Flask (~5,000) — when AI generates a Python web app, it picks FastAPI by default. SQLAlchemy is the dominant ORM (4,285). pytest (12,206) outscales every other Python framework — testing tooling is correctly chosen even when tests aren't written.

The endpoint surface

[from §4: Hero finding #2]

Framework Endpoints Share of corpus endpoints
Express 433,177 61.8%
FastAPI 131,812 18.8%
FastAPI/Flask (overlap) 54,231 7.7%
Express/Koa (overlap) 53,020 7.6%
Plain Python (no framework) 24,029 3.4%
Flask 4,694 0.7%

62% of all AI-generated HTTP endpoints are Express — Node + Express is by far the dominant API surface. Express has minimal opinionation; auth middleware is a per-route afterthought; the result is the 99% no-auth statistic.

Testing-framework gravity

Repos that use a test framework score dramatically higher on overall quality than repos that don't:

Framework Repos Avg Overall Avg Testing
pytest 12,206 68.6 61.6
Vitest 13,484 64.8 55.8
Jest 3,841 62.5 53.5
(no test framework detected) ~80,000 ~52 ~12

Adopting a test framework is the single strongest predictor of overall quality. pytest repos score 13 points higher than the corpus average; Vitest, 7 points; Jest, 5 points. The mechanism is simple: when the AI sees a pytest.ini or vitest.config.ts in the repo, it generates tests; when it doesn't, it skips them.

Implication for engineering leaders: the cheapest quality intervention is to commit pytest.ini / vitest.config.ts empty in your starter templates. AI will pick it up and generate tests by default.


11. What this means for engineering leaders

Three takeaways:

  1. Treat AI-generated code as feature-complete, guardrail-incomplete. The default output is missing tests, missing auth wiring, missing documentation, missing license clarity. Build your review process around those four gaps explicitly.
  2. Quality-gates beat after-the-fact audits. Once a repo has 14.5 hours of accumulated debt, retrofitting tests / auth / docs is harder than gating new commits at the front door. CI quality gates that fail PRs below a threshold (e.g. "score must not regress") are roughly 40× cheaper than the same fix done in arrears (estimate from intervention cost models).
  3. The security surface is the API surface. With 99.2% of routes lacking auth wiring, every new AI-generated repo is a potential public endpoint with no gatekeeper. Enforce auth-required-by-default at the framework / starter-template level, not at the route level.

Appendix A — How we measure

Methodology details, scoring formulae, edge cases. (~3 pages)

Appendix B — Reproducibility notes

The aggregate findings in this report are reproducible by anyone running the Repobility analysis pipeline against the same corpus. Repository identities, source code, and per-repo findings are not part of this report and are not for sale. Repobility licenses access to the analysis API for code your organization owns or has authorization to scan.

Appendix C — Citations

[Methodology references, comparable studies, reading list]


About Repobility

Repobility is the code-quality intelligence platform for AI-generated software. We analyze every commit your AI ships, surface ranked findings to your AI agent for fixing, and feed the false-positive corrections back to improve the platform.


Want the underlying data?

We don't sell datasets, dumps, or data-rooms. We license access to the analysis API for code your organization owns. For aggregate research collaborations: [email protected].


This report was generated 2026-05-01 from the live Repobility corpus snapshot. Stats may have shifted slightly since publication. The corresponding corpus-stats.json is available at repobility.com/research/state-2026-data.json.