LiveBench/LiveBench

Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.

127 of your 352 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 3.28s · analysis 9.69s · 9.2 MB · GitHub API rate-limit (preflight)

https://github.com/LiveBench/LiveBench · scanned 2026-06-05 21:06 UTC (4 days, 12 hours ago) · 10 languages

475 raw signals (295 security + 180 graph) 32nd percentile · Python · medium (20-100K LoC) System graph score 85 (lower by 43)

File as issue Image

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 4 days, 12 hours ago · v2 · 232 actionable findings from 2 signal sources. 153 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON

100.0% cov · 76 findings

Score breakdown â 2026-05-18-v5

Component	Sub-score	Weight	Contribution
`structure_score`	60.0	0.15	9.00
`security_score`	30.0	0.25	7.50
`testing_score`	20.0	0.20	4.00
`documentation_score`	81.0	0.15	12.15
`practices_score`	40.0	0.15	6.00
`code_quality`	27.3	0.10	2.73
Overall		1.00	41.4

Severity distribution — click a segment to filter

Active filters: excluding tests × Reset all

Severity: Critical 5 High 90 Medium 33 Low 72 9-Layer: Software Security Quality Integrity Frontend Hardware Data Network Cicd Source: Security checks 156 System graph 76 Crowd 0 Layer: Software 146 Quality 68 Security 15 Api 1 Frontend 1 Cicd 1

Exclude dismissed / FP Exclude test files

Top 10 actions, ranked by impact × ease. Severity drives impact; tag-based fix-clarity drives ease.

Insecure pattern 'exec_used' in livebench/code_runner/eval/init.py:158

GapSeverity.HIGH Layer.SECURITY score 0.413

Why: high severity · OWASP-class risk

livebench/code_runner/eval/__init__.py:158

Insecure pattern 'exec_used' in livebench/lcb_runner/evaluation/testing_util.py:160

GapSeverity.HIGH Layer.SECURITY score 0.413

Why: high severity · OWASP-class risk

livebench/lcb_runner/evaluation/testing_util.py:160

Insecure pattern 'subprocess_shell_true' in livebench/agentic_code_runner/minisweagent/environments/local.py:25

GapSeverity.MEDIUM Layer.SECURITY score 0.248

Why: medium severity · OWASP-class risk

livebench/agentic_code_runner/minisweagent/environments/local.py:25

Insecure pattern 'subprocess_shell_true' in livebench/agentic_code_runner/minisweagent/environments/docker.py:106

GapSeverity.MEDIUM Layer.SECURITY score 0.248

Why: medium severity · OWASP-class risk

livebench/agentic_code_runner/minisweagent/environments/docker.py:106

Insecure pattern 'subprocess_shell_true' in livebench/agentic_code_runner/minisweagent/environments/extra/swerex_docker.py:33

GapSeverity.MEDIUM Layer.SECURITY score 0.248

Why: medium severity · OWASP-class risk

livebench/agentic_code_runner/minisweagent/environments/extra/swerex_docker.py:33

No CI/CD pipelines detected

GapSeverity.MEDIUM Layer.CICD score 0.225

Why: medium severity

Very low test-to-source ratio

GapSeverity.MEDIUM Layer.QUALITY score 0.225

Why: medium severity

Network/subprocess call without timeout or try/except — livebench/scripts/check_question_variance.py:93

GapSeverity.MEDIUM Layer.QUALITY score 0.225

Why: medium severity · crashes are likely without this

Fix: Add a `timeout=N` argument and wrap in `try/except` in livebench/scripts/check_question_variance.py:93 to prevent hung calls.

Network/subprocess call without timeout or try/except — livebench/scripts/check_grading_flakiness.py:148

GapSeverity.MEDIUM Layer.QUALITY score 0.225

Why: medium severity · crashes are likely without this

Fix: Add a `timeout=N` argument and wrap in `try/except` in livebench/scripts/check_grading_flakiness.py:148 to prevent hung calls.

#10

Network/subprocess call without timeout or try/except — livebench/agentic_code_runner/minisweagent/environments/docker.py:106

GapSeverity.MEDIUM Layer.QUALITY score 0.225

Why: medium severity · crashes are likely without this

Fix: Add a `timeout=N` argument and wrap in `try/except` in livebench/agentic_code_runner/minisweagent/environments/docker.py:106 to prevent hung calls.

Click "Find this gap" on any action above to jump to it on the Findings tab. Adjust the chip bar to filter by impact (severity), layer, or source.

For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.

For AI agents + API integrations

Email me when this repo regresses

Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.

API access

This page is publicly accessible at: https://repobility.com/scan/285d8c54-1310-4654-8c87-9d14ef632d84/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/285d8c54-1310-4654-8c87-9d14ef632d84/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.