unclecode/crawl4ai

Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.

207 of your 283 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 8.73s · analysis 10.02s · 28.8 MB · GitHub API rate-limit (preflight)

https://github.com/unclecode/crawl4ai · scanned 2026-06-05 08:43 UTC (5 days, 20 hours ago) · 10 languages

861 raw signals (257 security + 604 graph) 11/13 scanners ran 82nd percentile · Python · large (100-500K LoC) System graph score 62 (higher by 23)

File as issue Image

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 5 days, 20 hours ago · v2 · 407 actionable findings from 2 signal sources. 152 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON

100.0% cov · 278 findings

Score breakdown â 2026-05-18-v5

Component	Sub-score	Weight	Contribution
`structure_score`	60.0	0.15	9.00
`security_score`	100.0	0.25	25.00
`testing_score`	97.0	0.20	19.40
`documentation_score`	100.0	0.15	15.00
`practices_score`	83.0	0.15	12.45
`code_quality`	45.0	0.10	4.50
Overall		1.00	85.3

security_score may be inflated — optional security scanners were skipped on this fast scan

Severity distribution — click a segment to filter

Active filters: excluding tests × Reset all

Severity: Critical 18 High 45 Medium 27 Low 192 9-Layer: Software Security Quality Integrity Frontend Hardware Data Network Cicd Source: Security checks 129 System graph 278 Crowd 0 Layer: Software 58 Quality 224 Security 27 Cicd 13 Frontend 28 Hardware 2 Api 55

Exclude dismissed / FP Exclude test files

Bug-class explainers. Each card groups findings of the same shape — these are the patterns most likely to ship to prod and reappear in future scans unless you systematically fix the cause, not just the instance.

Duplicates & near-duplicates 2 findings

What it is: Same function copy-pasted into multiple modules with minor variations.

Why it matters: Each copy drifts independently — bug fixes apply to one, miss the others.

How AI causes it: AI completes the same pattern in each file rather than refactoring to a shared helper.

Fix approach: Extract the duplicated logic into the most general module both call sites already import. Add tests at the helper level.

2 matching findings on this repo

low Near-duplicate function bodies in 2 places repo-level
low Near-duplicate function bodies in 3 places repo-level

View all duplicates & near-duplicates findings →

Legacy markers 8 findings

What it is: TODO, FIXME, XXX, HACK comments. Often indicate a known-broken path the author meant to fix.

Why it matters: Each marker is an unfinished thought. Production code shouldn't ship with debt that's documented but not tracked.

How AI causes it: AI mirrors the style of the codebase, so existing TODOs propagate into new code.

Fix approach: Convert each into a ticket. Delete the comment when the ticket lands. Use a pre-commit hook to block new TODOs without an issue link.

8 matching findings on this repo

low Old/deprecated-named symbol `extract_xml_data_legacy` in tests/regression/test_…
low Old/deprecated-named symbol `extract_xml_data_legacy` in crawl4ai/utils.py:1680
medium Network/subprocess call without timeout or try/except — crawl4ai/legacy/docs_ma…
high Blocking `requests.get(...)` inside `async def fetch_docs` — crawl4ai/legacy/do… crawl4ai/legacy/docs_manager.py:41
high Blocking `requests.get(...)` inside `async def fetch_docs` — crawl4ai/legacy/do… crawl4ai/legacy/docs_manager.py:49
info Commented-code block (12 lines) in crawl4ai/legacy/crawler_strategy.py:111
medium Network/subprocess call without timeout or try/except — crawl4ai/legacy/crawler…
low Stub function `crawl` (body is just `pass`/`return`) — crawl4ai/legacy/crawler_…

View all legacy markers findings →

Commented-out code 48 findings

What it is: Lines of source that were intentionally disabled but never deleted.

Why it matters: Git already remembers history — commented code rots, becomes wrong, and adds noise to diffs.

How AI causes it: AI sometimes comments out broken code instead of fixing it. Reviewers approve out of inertia.

Fix approach: Delete. Trust `git log`. If you really need to remember, save it in a notes file under `docs/`.

12 matching findings on this repo

info Commented-code block (5 lines) in tests/test_issue_1370_1818_1762_1509.py:198
info Commented-code block (5 lines) in tests/test_source_sibling_selector.py:134
info Commented-code block (5 lines) in tests/test_main.py:260
info Commented-code block (6 lines) in tests/test_docker.py:67
info Commented-code block (10 lines) in tests/docker/test_hooks_utility.py:180
info Commented-code block (6 lines) in tests/docker/test_docker.py:121
info Commented-code block (6 lines) in tests/docker/test_serialization.py:116
info Commented-code block (6 lines) in tests/docker/simple_api_test.py:160
info Commented-code block (5 lines) in tests/profiler/test_create_profile.py:27
info Commented-code block (6 lines) in tests/adaptive/test_embedding_strategy.py:604
info Commented-code block (13 lines) in tests/general/test_async_crawler_strategy.py…
info Commented-code block (5 lines) in tests/browser/test_builtin_browser.py:768

View all commented-out code findings →

Config drift 13 findings

What it is: Settings duplicated across env files, Docker compose, K8s, and code defaults, all with slightly different values.

Why it matters: Production behaviour depends on whichever copy your loader reads first. Subtle bugs in staging that don't reproduce in dev.

How AI causes it: AI writes new config from memory rather than reading the existing source.

Fix approach: Pick one source of truth (env vars + a settings module). Have every other place import from there. Lint for duplicates in CI.

12 matching findings on this repo

high [MINED112] FastAPI DELETE /models/{model_name} has no auth: Handler `delete_mod… docs/examples/website-to-api/api_server.py:341
high [MINED112] FastAPI POST /models has no auth: Handler `save_model_config` is reg… docs/examples/website-to-api/api_server.py:320
high [MINED106] Phantom test coverage: test_multi_config: Test function `test_multi_… tests/test_multi_config.py:13
high [MINED106] Phantom test coverage: test_webhook_config_model: Test function `tes… test_webhook_implementation.py:93
low File has no detected symbols: tests/memory/test_docker_config_gen.py
low File has no detected symbols: crawl4ai/config.py
low File has no detected symbols: crawl4ai/html2text/config.py
low Very large file: crawl4ai/async_configs.py (2343 lines)
low Old/deprecated-named symbol `test_get_defaults_returns_copy` in tests/test_conf…
low Old/deprecated-named symbol `test_get_defaults_returns_copy` in tests/regressio…
info Commented-code block (7 lines) in tests/async/test_evaluation_scraping_methods_…
info Commented-code block (10 lines) in crawl4ai/async_configs.py:221

View all config drift findings →

For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.

For AI agents + API integrations

Email me when this repo regresses

Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.

API access

This page is publicly accessible at: https://repobility.com/scan/dfd63be9-051b-41fa-be97-0e1a8a59c2d1/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/dfd63be9-051b-41fa-be97-0e1a8a59c2d1/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.