LiveBench/LiveBench

Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.

127 of your 352 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 3.28s · analysis 9.69s · 9.2 MB · GitHub API rate-limit (preflight)

https://github.com/LiveBench/LiveBench · scanned 2026-06-05 21:06 UTC (4 days, 12 hours ago) · 10 languages

475 raw signals (295 security + 180 graph) 32nd percentile · Python · medium (20-100K LoC) System graph score 85 (lower by 43)

File as issue Image

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 4 days, 12 hours ago · v2 · 232 actionable findings from 2 signal sources. 153 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON

100.0% cov · 76 findings

Score breakdown â 2026-05-18-v5

Component	Sub-score	Weight	Contribution
`structure_score`	60.0	0.15	9.00
`security_score`	30.0	0.25	7.50
`testing_score`	20.0	0.20	4.00
`documentation_score`	81.0	0.15	12.15
`practices_score`	40.0	0.15	6.00
`code_quality`	27.3	0.10	2.73
Overall		1.00	41.4

Severity distribution — click a segment to filter

Active filters: excluding tests × Reset all

Severity: Critical 5 High 90 Medium 33 Low 72 9-Layer: Software Security Quality Integrity Frontend Hardware Data Network Cicd Source: Security checks 156 System graph 76 Crowd 0 Layer: Software 146 Quality 68 Security 15 Api 1 Frontend 1 Cicd 1

Exclude dismissed / FP Exclude test files

All 2958 nodes from the latest scan, grouped by kind. Each node is a unit the engine identified (file, function, endpoint, table…). Most users won't need this view — it's primarily for debugging the engine's graph extraction or for AI agents that want to enumerate the project structure.

Label	Layer	Status	Path
`detect_active_venv`	software	healthy	`livebench/run_livebench.py:28`
`from_args`	software	healthy	`livebench/run_livebench.py:72`
`run_command`	software	healthy	`livebench/run_livebench.py:114`
`setup_tmux_session`	software	healthy	`livebench/run_livebench.py:128`
`build_run_command`	software	healthy	`livebench/run_livebench.py:207`
`build_run_command_from_params`	software	healthy	`livebench/run_livebench.py:331`
`run_model`	software	healthy	`livebench/run_livebench.py:365`
`run_sequential`	software	healthy	`livebench/run_livebench.py:374`
`run_parallel`	software	healthy	`livebench/run_livebench.py:398`
`run_single`	software	healthy	`livebench/run_livebench.py:422`
`main`	software	healthy	`livebench/run_livebench.py:460`
`get_answer`	software	healthy	`livebench/gen_api_answer.py:33`
`setup_model`	software	healthy	`livebench/gen_api_answer.py:164`
`run_questions`	software	healthy	`livebench/gen_api_answer.py:209`
`reorg_output_file`	software	healthy	`livebench/gen_ground_truth_judgment.py:56`
`play_a_match_gt`	software	healthy	`livebench/gen_ground_truth_judgment.py:75`
`gen_judgments`	software	healthy	`livebench/gen_ground_truth_judgment.py:242`
`play_a_match_wrapper`	software	dead	`livebench/gen_ground_truth_judgment.py:507`
`calculate_usage`	software	healthy	`livebench/show_livebench_result.py:24`
`display_result_single`	software	dead	`livebench/show_livebench_result.py:226`
`check_agentic_coding_requirements`	software	healthy	`livebench/common.py:31`
`get_categories_tasks`	software	healthy	`livebench/common.py:93`
`get_hf_dataset`	software	healthy	`livebench/common.py:139`
`get_tasks_from_hf_category`	software	healthy	`livebench/common.py:145`
`load_answers_judgments`	software	healthy	`livebench/common.py:150`
`load_questions`	software	healthy	`livebench/common.py:177`
`load_questions_jsonl`	software	healthy	`livebench/common.py:238`
`load_test_cases_jsonl`	software	healthy	`livebench/common.py:274`
`load_model_answers`	software	healthy	`livebench/common.py:304`
`reorg_answer_file`	software	healthy	`livebench/common.py:335`
`make_match_single`	software	healthy	`livebench/common.py:351`
`normalize_game_key_single`	software	healthy	`livebench/common.py:380`
`normalize_game_key_dict`	software	dead	`livebench/common.py:395`
`load_single_model_judgments`	software	dead	`livebench/common.py:404`
`check_data`	software	healthy	`livebench/common.py:429`
`get_model_list`	software	healthy	`livebench/common.py:440`
`filter_questions`	software	healthy	`livebench/common.py:447`
`compatible_eval_result`	software	dead	`livebench/code_runner/eval/__init__.py:51`
`estimate_pass_at_k`	software	healthy	`livebench/code_runner/eval/__init__.py:61`
`estimator`	software	healthy	`livebench/code_runner/eval/__init__.py:70`
`is_floats`	software	dead	`livebench/code_runner/eval/__init__.py:101`
`unsafe_execute`	software	dead	`livebench/code_runner/eval/__init__.py:112`
`untrusted_check`	software	healthy	`livebench/code_runner/eval/__init__.py:200`
`evaluate_files`	software	dead	`livebench/code_runner/eval/__init__.py:254`
`trusted_exec`	software	healthy	`livebench/code_runner/eval/__init__.py:274`
`trusted_check_exec`	software	dead	`livebench/code_runner/eval/__init__.py:341`
`trusted_check`	software	dead	`livebench/code_runner/eval/__init__.py:351`
`swallow_subprocess_output`	software	healthy	`livebench/code_runner/eval/utils.py:38`
`_popen_patch`	software	healthy	`livebench/code_runner/eval/utils.py:43`
`_run_patch`	software	healthy	`livebench/code_runner/eval/utils.py:53`

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

Label	Layer	Status	Path
`LiveBenchParams`	software	healthy	`livebench/run_livebench.py:37`
`MatchSingle`	software	healthy	`livebench/common.py:80`
`SafePopen`	software	healthy	`livebench/code_runner/eval/utils.py:176`
`TimeoutException`	software	healthy	`livebench/code_runner/eval/utils.py:252`
`WriteOnlyStringIO`	software	healthy	`livebench/code_runner/eval/utils.py:256`
`redirect_stdin`	software	healthy	`livebench/code_runner/eval/utils.py:273`
`Model`	software	healthy	`livebench/agentic_code_runner/minisweagent/__init__.py:41`
`Environment`	software	healthy	`livebench/agentic_code_runner/minisweagent/__init__.py:53`
`Agent`	software	healthy	`livebench/agentic_code_runner/minisweagent/__init__.py:63`
`LitellmModelConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/litellm_m…`
`LitellmModel`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/litellm_m…`
`GlobalModelStats`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/__init__.…`
`DeterministicModelConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/test_mode…`
`DeterministicModel`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/test_mode…`
`ReplayAgent`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/replay.py…`
`InteractiveAgentConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/interacti…`
`InteractiveAgent`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/interacti…`
`AgentConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`NonTerminatingException`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`FormatError`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`ExecutionTimeoutError`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`TerminatingException`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`Submitted`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`LimitsExceeded`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`DefaultAgent`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents/default.p…`
`ProgressTrackingAgent`	software	healthy	`livebench/agentic_code_runner/minisweagent/run/run_batch.py…`
`StepTimeElapsedColumn`	software	healthy	`livebench/agentic_code_runner/minisweagent/run/batch_progre…`
`RunBatchProgressManager`	software	healthy	`livebench/agentic_code_runner/minisweagent/run/batch_progre…`
`LocalEnvironmentConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/loc…`
`LocalEnvironment`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/loc…`
`DockerEnvironmentConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/doc…`
`DockerEnvironment`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/doc…`
`SwerexDockerEnvironmentConfig`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/ext…`
`SwerexDockerEnvironment`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/ext…`
`ArgumentParser`	software	healthy	`livebench/agentic_code_runner/eval/utils/args_util.py:25`
`RepoCommits`	software	healthy	`livebench/agentic_code_runner/eval/harness/run_evaluation.p…`
`Patch`	software	healthy	`livebench/agentic_code_runner/eval/harness/run_evaluation.p…`
`CliArgs`	software	healthy	`livebench/agentic_code_runner/eval/harness/run_evaluation.p…`
`Instance`	software	healthy	`livebench/agentic_code_runner/eval/harness/instance.py:20`
`Repository`	software	healthy	`livebench/agentic_code_runner/eval/harness/pull_request.py:…`
`PullRequestBase`	software	healthy	`livebench/agentic_code_runner/eval/harness/pull_request.py:…`
`ResolvedIssue`	software	healthy	`livebench/agentic_code_runner/eval/harness/pull_request.py:…`
`Base`	software	healthy	`livebench/agentic_code_runner/eval/harness/pull_request.py:…`
`PullRequest`	software	healthy	`livebench/agentic_code_runner/eval/harness/pull_request.py:…`
`CliArgs`	software	healthy	`livebench/agentic_code_runner/eval/harness/gen_report.py:130`
`File`	software	healthy	`livebench/agentic_code_runner/eval/harness/image.py:23`
`Config`	software	healthy	`livebench/agentic_code_runner/eval/harness/image.py:30`
`Image`	software	healthy	`livebench/agentic_code_runner/eval/harness/image.py:36`
`SWEImageDefault`	software	healthy	`livebench/agentic_code_runner/eval/harness/image.py:116`
`TestStatus`	software	healthy	`livebench/agentic_code_runner/eval/harness/test_result.py:22`

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

Label	Layer	Status	Path
`README-es_CO.md`	software	healthy	`README-es_CO.md`
`README.md`	software	healthy	`README.md`
`pyproject.toml`	software	healthy	`pyproject.toml`
`changelog.md`	software	healthy	`changelog.md`
`MAINTENANCE_PLAN.md`	software	healthy	`docs/MAINTENANCE_PLAN.md`
`CONTRIBUTING.md`	software	healthy	`docs/CONTRIBUTING.md`
`AUTHOR_RESPONSIBILITY.md`	software	healthy	`docs/AUTHOR_RESPONSIBILITY.md`
`DATASHEET.md`	software	healthy	`docs/DATASHEET.md`
`CODE_OF_CONDUCT.md`	software	healthy	`docs/CODE_OF_CONDUCT.md`
`__init__.py`	software	healthy	`livebench/__init__.py`
`run_livebench.py`	software	healthy	`livebench/run_livebench.py`
`gen_api_answer.py`	software	healthy	`livebench/gen_api_answer.py`
`gen_ground_truth_judgment.py`	software	healthy	`livebench/gen_ground_truth_judgment.py`
`download_questions.py`	software	warning	`livebench/download_questions.py`
`download_leaderboard.py`	software	warning	`livebench/download_leaderboard.py`
`.env.example`	software	healthy	`livebench/.env.example`
`show_livebench_result.py`	software	healthy	`livebench/show_livebench_result.py`
`common.py`	software	healthy	`livebench/common.py`
`__init__.py`	software	healthy	`livebench/code_runner/eval/__init__.py`
`README.md`	software	healthy	`livebench/code_runner/eval/README.md`
`utils.py`	software	healthy	`livebench/code_runner/eval/utils.py`
`edit_questions.py`	software	healthy	`livebench/scripts/edit_questions.py`
`find_hardest_question.py`	software	healthy	`livebench/scripts/find_hardest_question.py`
`calc_token_offset.py`	software	healthy	`livebench/scripts/calc_token_offset.py`
`spend_report.py`	software	healthy	`livebench/scripts/spend_report.py`
`check_coding_questions.py`	software	healthy	`livebench/scripts/check_coding_questions.py`
`question_to_csv.py`	software	healthy	`livebench/scripts/question_to_csv.py`
`answer_csv_to_jsonl.py`	software	healthy	`livebench/scripts/answer_csv_to_jsonl.py`
`check_question_variance.py`	software	healthy	`livebench/scripts/check_question_variance.py`
`fail_missing_questions.py`	software	healthy	`livebench/scripts/fail_missing_questions.py`
`rename_model.py`	software	healthy	`livebench/scripts/rename_model.py`
`inspect_agentic_traj.py`	software	healthy	`livebench/scripts/inspect_agentic_traj.py`
`inspect_model_answers.py`	software	healthy	`livebench/scripts/inspect_model_answers.py`
`calc_attribute_stats.py`	software	healthy	`livebench/scripts/calc_attribute_stats.py`
`test_prompts.py`	software	healthy	`livebench/scripts/test_prompts.py`
`check_question_difficulties.py`	software	healthy	`livebench/scripts/check_question_difficulties.py`
`apply_edits_to_code_completion.py`	software	healthy	`livebench/scripts/apply_edits_to_code_completion.py`
`rerun_many.py`	software	healthy	`livebench/scripts/rerun_many.py`
`replay_agent_trajectory.py`	software	healthy	`livebench/scripts/replay_agent_trajectory.py`
`compare_score_tables.py`	software	healthy	`livebench/scripts/compare_score_tables.py`
`error_check.py`	software	healthy	`livebench/scripts/error_check.py`
`rerun_failed_questions.py`	software	healthy	`livebench/scripts/rerun_failed_questions.py`
`find_differential_question.py`	software	healthy	`livebench/scripts/find_differential_question.py`
`rescue_agentic_submission.py`	software	healthy	`livebench/scripts/rescue_agentic_submission.py`
`check_grading_flakiness.py`	software	healthy	`livebench/scripts/check_grading_flakiness.py`
`syntax_error_finder.py`	software	healthy	`livebench/scripts/syntax_error_finder.py`
`__init__.py`	software	healthy	`livebench/agentic_code_runner/minisweagent/__init__.py`
`run_inference.py`	software	healthy	`livebench/agentic_code_runner/minisweagent/run_inference.py`
`litellm_model.py`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/litellm_m…`
`__init__.py`	software	healthy	`livebench/agentic_code_runner/minisweagent/models/__init__.…`

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

Label	Layer	Status	Path
`docs`	software	healthy	`docs`
`livebench`	software	healthy	`livebench`
`code_runner`	software	healthy	`livebench/code_runner`
`eval`	software	healthy	`livebench/code_runner/eval`
`scripts`	software	healthy	`livebench/scripts`
`agentic_code_runner`	software	healthy	`livebench/agentic_code_runner`
`minisweagent`	software	healthy	`livebench/agentic_code_runner/minisweagent`
`models`	software	healthy	`livebench/agentic_code_runner/minisweagent/models`
`config`	software	healthy	`livebench/agentic_code_runner/minisweagent/config`
`utils`	software	healthy	`livebench/agentic_code_runner/minisweagent/utils`
`agents`	software	healthy	`livebench/agentic_code_runner/minisweagent/agents`
`run`	software	healthy	`livebench/agentic_code_runner/minisweagent/run`
`utils`	software	healthy	`livebench/agentic_code_runner/minisweagent/run/utils`
`environments`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments`
`extra`	software	healthy	`livebench/agentic_code_runner/minisweagent/environments/ext…`
`eval`	software	healthy	`livebench/agentic_code_runner/eval`
`utils`	software	healthy	`livebench/agentic_code_runner/eval/utils`
`harness`	software	healthy	`livebench/agentic_code_runner/eval/harness`
`repos`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos`
`c`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c`
`php`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/php`
`mruby`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/mruby`
`facebook`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/facebook`
`jqlang`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/jqlang`
`libsdlorg`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/libsdlorg`
`redis`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/redis`
`OpenMathLib`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/OpenMath…`
`valkey_io`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/valkey_io`
`fluent`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/fluent`
`libgit2`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/libgit2`
`ponylang`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/ponylang`
`rust`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust`
`bee_san`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/bee_s…`
`helix_editor`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/helix…`
`sharkdp`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/shark…`
`BurntSushi`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/Burnt…`
`nushell`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/nushe…`
`alacritty`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/alacr…`
`clap_rs`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/clap_…`
`rusqlite`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/rusql…`
`rust_lang`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/rust_…`
`fish_shell`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/fish_…`
`rayon_rs`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/rayon…`
`serde_rs`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/serde…`
`tokio_rs`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/rust/tokio…`
`javascript`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/javascript`
`google`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/javascript…`
`vuejs`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/javascript…`
`iamkun`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/javascript…`
`expressjs`	software	healthy	`livebench/agentic_code_runner/eval/harness/repos/javascript…`

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

Label	Layer	Status	Path
`auth::livebench/agentic_code_runner/eval/harness/repos/java…`	security	healthy	`livebench/agentic_code_runner/eval/harness/repos/java/__ini…`
`auth::livebench/agentic_code_runner/eval/harness/repos/java…`	security	healthy	`livebench/agentic_code_runner/eval/harness/repos/java/keycl…`
`auth::livebench/agentic_code_runner/eval/harness/repos/java…`	security	healthy	`livebench/agentic_code_runner/eval/harness/repos/java/keycl…`

Label	Layer	Status	Path
`redis`	data	healthy	`livebench/agentic_code_runner/eval/harness/repos/c/__init__…`
`sqlite`	data	healthy	`livebench/agentic_code_runner/eval/harness/repos/cpp/cgal/c…`

Label	Layer	Status	Path
`repobility-clone-ngqtj7lm`	software	healthy	`/tmp/repobility-clone-ngqtj7lm`

Label	Layer	Status	Path
`caddy`	network	healthy	`livebench/agentic_code_runner/eval/harness/repos/golang/cad…`

Label	Layer	Status	Path
`vps::aws`	hardware	healthy	`livebench/model/completions.py`

Label	Layer	Status	Path
`gpu (detected)`	hardware	healthy	`livebench/model/model_configs/nvidia.yml`

For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.

For AI agents + API integrations

Email me when this repo regresses

Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.

API access

This page is publicly accessible at: https://repobility.com/scan/285d8c54-1310-4654-8c87-9d14ef632d84/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/285d8c54-1310-4654-8c87-9d14ef632d84/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.

LiveBench/LiveBench

Complete repo analysis

function 50+

class 50+

file 50+

directory 50+

auth 3

database 2

repo 1

load_balancer 1

vps 1

gpu 1