Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.
127 of your 352 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 3.28s · analysis 9.69s · 9.2 MB · GitHub API rate-limit (preflight)

LiveBench/LiveBench

https://github.com/LiveBench/LiveBench · scanned 2026-06-05 21:06 UTC (4 days, 11 hours ago) · 10 languages

475 raw signals (295 security + 180 graph) 32nd percentile · Python · medium (20-100K LoC) System graph score 85 (lower by 43)

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 4 days, 11 hours ago · v2 · 232 actionable findings from 2 signal sources. 153 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON
Score breakdown â 2026-05-18-v5
Component Sub-score Weight Contribution
structure_score 60.0 0.15 9.00
security_score 30.0 0.25 7.50
testing_score 20.0 0.20 4.00
documentation_score 81.0 0.15 12.15
practices_score 40.0 0.15 6.00
code_quality 27.3 0.10 2.73
Overall 1.00 41.4
Severity distribution — click a segment to filter
Active filters: excluding tests × Reset all
Scan summary Quality grade D (41/100). Dimensions: security 30, maintainability 60. 295 findings (123 security). 61,565 lines analyzed.

Showing 197 of 232 actionable findings. 385 raw detector signals were grouped into reader-sized issues. Click TP / FP to vote on a finding's accuracy — votes adjust the confidence weighting and improve detection across the platform.

critical Security checks software dependencies conf 0.88 django: GHSA-frmv-pr5f-9mcr
Django vulnerable to SQL injection via _connector keyword argument in QuerySet and Q objects.
livebench/code_runner/requirements_eval.txt
critical Security checks software dependencies conf 0.88 django: GHSA-pv4p-cwwg-4rph
Django SQL injection vulnerability
livebench/code_runner/requirements_eval.txt
critical Security checks software dependencies conf 0.88 keras: GHSA-x4wf-678h-2pmq
Keras code injection vulnerability
livebench/code_runner/requirements_eval.txt
high Security checks quality Quality conf 1.00 ✓ Repobility 6 occurrences Missing import: `stat` used but not imported
The file uses `stat.something(...)` but never imports `stat`. This raises NameError at runtime the first time the line executes.
5 files, 6 locations
livebench/process_results/math/AMPS_Hard/utils.py:49, 98 (2 hits)
livebench/code_runner/eval/__init__.py:240
livebench/if_runner/instruction_following_eval/instructions.py:162
livebench/process_results/math/olympiad/utils.py:63
livebench/process_results/util.py:7
critical Security checks software dependencies conf 0.88 nltk: GHSA-7p94-766c-hgjp
NLTK has a Zip Slip Vulnerability
livebench/code_runner/requirements_eval.txt
critical Security checks software dependencies conf 0.88 tensorflow: GHSA-gw97-ff7c-9v96
TensorFlow has a heap out-of-buffer read vulnerability in the QuantizeAndDequantize operation
livebench/code_runner/requirements_eval.txt
low Security checks quality Quality conf 1.00 ✓ Repobility 3 occurrences [MINED006] Overcatch Baseexception: except BaseException: ... — prevents Ctrl+C and SystemExit from working.
Review and fix per the pattern semantics. See CWE-705 / for context.
3 files, 3 locations
livebench/agentic_code_runner/minisweagent/agents/interactive.py:73
livebench/agentic_code_runner/minisweagent/run/run_batch.py:209
livebench/agentic_code_runner/minisweagent/run_inference.py:189
low Security checks quality Quality conf 1.00 ✓ Repobility 3 occurrences [MINED012] Curl Pipe Bash: curl ... | sh / bash — runs unverified network code.
Review and fix per the pattern semantics. See CWE-494 / A08:2021 for context.
3 files, 3 locations
livebench/agentic_code_runner/eval/harness/repos/javascript/axios/axios.py:59
livebench/agentic_code_runner/eval/harness/repos/javascript/sveltejs/svelte.py:51
livebench/agentic_code_runner/eval/harness/repos/typescript/ant_design/ant_design.py:52
high Security checks quality Quality conf 1.00 ✓ Repobility [MINED034] Python Subprocess Shell True: subprocess(..., shell=True) enables command injection.
Review and fix per the pattern semantics. See CWE-78 / for context.
livebench/agentic_code_runner/minisweagent/environments/local.py:23
high Security checks quality Quality conf 1.00 ✓ Repobility [MINED034] Python Subprocess Shell True: subprocess(..., shell=True) enables command injection.
Review and fix per the pattern semantics. See CWE-78 / for context.
livebench/agentic_code_runner/minisweagent/environments/docker.py:106
high Security checks security path traversal conf 0.80 3 occurrences [SEC013] Path Traversal — User Input in File Path: User-controlled input used in file path without sanitization. Allows reading arbitrary files.
Use os.path.realpath() and verify the path starts with your expected base directory. Use secure_filename() for uploads.
3 files, 3 locations
livebench/if_runner/ifbench/evaluation_lib.py:45
livebench/if_runner/instruction_following_eval/evaluation_main.py:191
livebench/scripts/answer_csv_to_jsonl.py:11
low Security checks security Injection conf 1.00 [SEC103] LDAP injection — non-constant search filter: User input concatenated into an LDAP search filter. Attackers inject `*)(uid=*` style payloads to bypass auth or enumerate accounts.
Escape with javax.naming.ldap.Rdn.escapeValue or equivalent. For python-ldap, use ldap.filter.escape_filter_chars. Better: use parameterized search APIs (Spring LdapTemplate filter encoders).
livebench/process_results/reasoning/logic_with_navigation/utils.py:28
low Security checks security Injection conf 1.00 [SEC103] LDAP injection — non-constant search filter: User input concatenated into an LDAP search filter. Attackers inject `*)(uid=*` style payloads to bypass auth or enumerate accounts.
Escape with javax.naming.ldap.Rdn.escapeValue or equivalent. For python-ldap, use ldap.filter.escape_filter_chars. Better: use parameterized search APIs (Spring LdapTemplate filter encoders).
livebench/agentic_code_runner/eval/harness/repos/c/mruby/mruby.py:423
high Security checks quality Quality conf 1.00 ✓ Repobility 25 occurrences `self.model` used but never assigned in __init__
Method `add_message` of class `InteractiveAgent` reads `self.model`, but no assignment to it exists in __init__ (and no class-level fallback). This raises AttributeError the first time the method runs against an instance.
6 files, 25 locations
livebench/agentic_code_runner/minisweagent/run/batch_progress.py:111, 140, 155, 174, 175, 176, 178, 181, +1 more (9 hits)
livebench/agentic_code_runner/minisweagent/agents/interactive.py:47, 57, 58, 63, 75, 86, 87 (7 hits)
livebench/code_runner/eval/utils.py:191, 192, 195, 196 (4 hits)
livebench/agentic_code_runner/minisweagent/agents/replay.py:62, 78, 79 (3 hits)
livebench/agentic_code_runner/minisweagent/environments/docker.py:110
livebench/agentic_code_runner/minisweagent/run/run_batch.py:48
high Security checks software dependencies conf 0.88 cryptography: GHSA-3ww4-gg4f-jr7f
Python Cryptography package vulnerable to Bleichenbacher timing oracle attack
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 cryptography: GHSA-r6ph-v2qm-q3c2
cryptography Vulnerable to a Subgroup Attack Due to Missing Subgroup Validation for SECT Curves
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 cryptography: GHSA-x4qr-2fvf-3mr5
Vulnerable OpenSSL included in cryptography wheels
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 cryptography: PYSEC-2023-11
cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. In affected versions `Cipher.update_into` would accept Python objects which implement the buffer protocol, but provide only immutable buffers. This would allow immutable objects (such as `bytes`)…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 cryptography: PYSEC-2023-254
cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. Calling `load_pem_pkcs7_certificates` or `load_der_pkcs7_certificates` could lead to a NULL-pointer dereference and segfault. Exploitation of this vulnerability poses a serious risk of Denial of …
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 cryptography: PYSEC-2024-225
cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. Starting in version 38.0.0 and prior to version 42.0.4, if `pkcs12.serialize_key_and_certificates` is called with both a certificate whose public key did not match the provided private key and an…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 cryptography: PYSEC-2026-35
cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. Prior to version 46.0.6, DNS name constraints were only validated against SANs within child certificates, and not the "peer name" presented during each validation. Consequently, cryptography woul…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: GHSA-8p8v-wh79-9r56
Django vulnerable to Uncontrolled Resource Consumption
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-102
An issue was discovered in Django 5.1 before 5.1.1, 5.0 before 5.0.9, and 4.2 before 4.2.16. The urlize() and urlizetrunc() template filters are subject to a potential denial-of-service attack via very large inputs with a specific sequence of characters.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-156
An issue was discovered in Django 5.1 before 5.1.4, 5.0 before 5.0.10, and 4.2 before 4.2.17. The strip_tags() method and striptags template filter are subject to a potential denial-of-service attack via certain inputs containing large sequences of nested incomplete HTML entities.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-157
An issue was discovered in Django 5.1 before 5.1.4, 5.0 before 5.0.10, and 4.2 before 4.2.17. Direct usage of the django.db.models.fields.json.HasKey lookup, when an Oracle database is used, is subject to SQL injection if untrusted data is used as an lhs value. (Applications that use the jsonfield.…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-28
An issue was discovered in Django 3.2 before 3.2.24, 4.2 before 4.2.10, and Django 5.0 before 5.0.2. The intcomma template filter was subject to a potential denial-of-service attack when used with very long strings.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-47
In Django 3.2 before 3.2.25, 4.2 before 4.2.11, and 5.0 before 5.0.3, the django.utils.text.Truncator.words() method (with html=True) and the truncatewords_html template filter are subject to a potential regular expression denial-of-service attack via a crafted string. NOTE: this issue exists becau…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-56
An issue was discovered in Django 4.2 before 4.2.14 and 5.0 before 5.0.7. urlize and urlizetrunc were subject to a potential denial of service attack via certain inputs with a very large number of brackets.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-57
An issue was discovered in Django 5.0 before 5.0.7 and 4.2 before 4.2.14. The django.contrib.auth.backends.ModelBackend.authenticate() method allows remote attackers to enumerate users via a timing attack involving login requests for users with an unusable password.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-58
An issue was discovered in Django 5.0 before 5.0.7 and 4.2 before 4.2.14. Derived classes of the django.core.files.storage.Storage base class, when they override generate_filename() without replicating the file-path validations from the parent class, potentially allow directory traversal via certai…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-59
An issue was discovered in Django 5.0 before 5.0.7 and 4.2 before 4.2.14. get_supported_language_variant() was subject to a potential denial-of-service attack when used with very long strings containing specific characters.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-67
An issue was discovered in Django 5.0 before 5.0.8 and 4.2 before 4.2.15. The floatformat template filter is subject to significant memory consumption when given a string representation of a number in scientific notation with a large exponent.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-68
An issue was discovered in Django 5.0 before 5.0.8 and 4.2 before 4.2.15. The urlize() and urlizetrunc() template filters are subject to a potential denial-of-service attack via very large inputs with a specific sequence of characters.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2024-69
An issue was discovered in Django 5.0 before 5.0.8 and 4.2 before 4.2.15. The urlize and urlizetrunc template filters, and the AdminURLFieldWidget widget, are subject to a potential denial-of-service attack via certain inputs with a very large number of Unicode characters.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-1
An issue was discovered in Django 5.1 before 5.1.5, 5.0 before 5.0.11, and 4.2 before 4.2.18. Lack of upper-bound limit enforcement in strings passed when performing IPv6 validation could lead to a potential denial-of-service attack. The undocumented and private functions clean_ipv6_address and is_…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-104
An issue was discovered in 5.2 before 5.2.9, 5.1 before 5.1.15, and 4.2 before 4.2.27. `FilteredRelation` is subject to SQL injection in column aliases, using a suitably crafted dictionary, with dictionary expansion, as the `**kwargs` passed to `QuerySet.annotate()` or `QuerySet.alias()` on Postgre…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-105
An issue was discovered in Django 4.2 before 4.2.24, 5.1 before 5.1.12, and 5.2 before 5.2.6. FilteredRelation is subject to SQL injection in column aliases, using a suitably crafted dictionary, with dictionary expansion, as the **kwargs passed QuerySet.annotate() or QuerySet.alias().
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-106
An issue was discovered in Django 4.2 before 4.2.25, 5.1 before 5.1.13, and 5.2 before 5.2.7. QuerySet.annotate(), QuerySet.alias(), QuerySet.aggregate(), and QuerySet.extra() are subject to SQL injection in column aliases, when using a suitably crafted dictionary, with dictionary expansion, as the…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-107
An issue was discovered in 5.1 before 5.1.14, 4.2 before 4.2.26, and 5.2 before 5.2.8. NFKC normalization in Python is slow on Windows. As a consequence, `django.http.HttpResponseRedirect`, `django.http.HttpResponsePermanentRedirect`, and the shortcut `django.shortcuts.redirect` were subject to a …
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-109
An issue was discovered in 5.2 before 5.2.9, 5.1 before 5.1.15, and 4.2 before 4.2.27. Algorithmic complexity in `django.core.serializers.xml_serializer.getInnerText()` allows a remote attacker to cause a potential denial-of-service attack triggering CPU and memory exhaustion via specially crafted …
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-13
An issue was discovered in Django 5.1 before 5.1.7, 5.0 before 5.0.13, and 4.2 before 4.2.20. The django.utils.text.wrap() method and wordwrap template filter are subject to a potential denial-of-service attack when used with very long strings.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-37
An issue was discovered in Django 4.2 before 4.2.21, 5.1 before 5.1.9, and 5.2 before 5.2.1. The django.utils.html.strip_tags() function is vulnerable to a potential denial-of-service (slow performance) when processing inputs containing large sequences of incomplete HTML tags. The template filter s…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2025-47
An issue was discovered in Django 5.2 before 5.2.2, 5.1 before 5.1.10, and 4.2 before 4.2.22. Internal HTTP response logging does not escape request.path, which allows remote attackers to potentially manipulate log output via crafted URLs. This may lead to log injection or forgery when logs are vie…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-42
An issue was discovered in 6.0 before 6.0.2, 5.2 before 5.2.11, and 4.2 before 4.2.28. The `django.contrib.auth.handlers.modwsgi.check_password()` function for authentication via `mod_wsgi` allows remote attackers to enumerate users via a timing attack. Earlier, unsupported Django series (such as 5…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-43
An issue was discovered in 6.0 before 6.0.2, 5.2 before 5.2.11, and 4.2 before 4.2.28. `ASGIRequest` allows a remote attacker to cause a potential denial-of-service via a crafted request with multiple duplicate headers. Earlier, unsupported Django series (such as 5.0.x, 4.1.x, and 3.2.x) were not e…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-44
An issue was discovered in 6.0 before 6.0.2, 5.2 before 5.2.11, and 4.2 before 4.2.28. Raster lookups on ``RasterField`` (only implemented on PostGIS) allows remote attackers to inject SQL via the band index parameter. Earlier, unsupported Django series (such as 5.0.x, 4.1.x, and 3.2.x) were not ev…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-45
An issue was discovered in 6.0 before 6.0.2, 5.2 before 5.2.11, and 4.2 before 4.2.28. `django.utils.text.Truncator.chars()` and `Truncator.words()` methods (with `html=True`) and the `truncatechars_html` and `truncatewords_html` template filters allow a remote attacker to cause a potential denial-…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-46
An issue was discovered in 6.0 before 6.0.2, 5.2 before 5.2.11, and 4.2 before 4.2.28. `FilteredRelation` is subject to SQL injection in column aliases via control characters, using a suitably crafted dictionary, with dictionary expansion, as the `**kwargs` passed to `QuerySet` methods `annotate()`…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-47
An issue was discovered in 6.0 before 6.0.2, 5.2 before 5.2.11, and 4.2 before 4.2.28. `.QuerySet.order_by()` is subject to SQL injection in column aliases containing periods when the same alias is, using a suitably crafted dictionary, with dictionary expansion, used in `FilteredRelation`. Earlier,…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-48
An issue was discovered in 6.0 before 6.0.4, 5.2 before 5.2.13, and 4.2 before 4.2.30. `MultiPartParser` allows remote attackers to degrade performance by submitting multipart uploads with `Content-Transfer-Encoding: base64` including excessive whitespace. Earlier, unsupported Django series (such a…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-49
An issue was discovered in 6.0 before 6.0.4, 5.2 before 5.2.13, and 4.2 before 4.2.30. ASGI requests with a missing or understated `Content-Length` header could bypass the `DATA_UPLOAD_MAX_MEMORY_SIZE` limit when reading `HttpRequest.body`, allowing remote attackers to load an unbounded request bod…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-51
An issue was discovered in 6.0 before 6.0.4, 5.2 before 5.2.13, and 4.2 before 4.2.30. `ASGIRequest` allows a remote attacker to spoof headers by exploiting an ambiguous mapping of two header variants (with hyphens or with underscores) to a single version with underscores. Earlier, unsupported Djan…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-52
An issue was discovered in 6.0 before 6.0.4, 5.2 before 5.2.13, and 4.2 before 4.2.30. Add permissions on inline model instances were not validated on submission of forged `POST` data in `GenericInlineModelAdmin`. Earlier, unsupported Django series (such as 5.0.x, 4.1.x, and 3.2.x) were not evaluat…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 django: PYSEC-2026-53
An issue was discovered in 6.0 before 6.0.4, 5.2 before 5.2.13, and 4.2 before 4.2.30. Admin changelist forms using `ModelAdmin.list_editable` incorrectly allowed new instances to be created via forged `POST` data. Earlier, unsupported Django series (such as 5.0.x, 4.1.x, and 3.2.x) were not evalua…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 geopandas: PYSEC-2026-62
SQL injection vulnerability in geopandas before v.1.1.2 allows an attacker to obtain sensitive information via the to_postgis()` function being used to write GeoDataFrames to a PostgreSQL database.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 keras: GHSA-36fq-jgmw-4r9c
Keras is vulnerable to Deserialization of Untrusted Data
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 keras: GHSA-4f3f-g24h-fr8m
Keras has an untrusted deserialization vulnerability
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 keras: GHSA-hjqc-jx6g-rwp9
Keras Directory Traversal Vulnerability
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 keras: PYSEC-2025-121
An issue in keras 3.7.0 allows attackers to write arbitrary files to the user's machine via downloading a crafted tar file through the get_file function.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 lxml: PYSEC-2026-87
lxml is a library for processing XML and HTML in the Python language. Prior to 6.1.0, using either of the two parsers in the default configuration (with resolve_entities=True) allows untrusted XML input to read local files. Setting the resolve_entities option explicitly to resolve_entities='interna…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 nltk: GHSA-469j-vmhf-r6v7
NLTK has a Downloader Path Traversal Vulnerability (AFO) - Arbitrary File Overwrite
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 nltk: GHSA-jm6w-m3j8-898g
Unauthenticated remote shutdown in nltk.app.wordnet_app
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 nltk: PYSEC-2024-167
NLTK through 3.8.1 allows remote code execution if untrusted packages have pickled Python code, and the integrated data package download functionality is used. This affects, for example, averaged_perceptron_tagger and punkt.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 nltk: PYSEC-2026-97
A vulnerability in the `filestring()` function of the `nltk.util` module in nltk version 3.9.2 allows arbitrary file read due to improper validation of input paths. The function directly opens files specified by user input without sanitization, enabling attackers to access sensitive system files by…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 nltk: PYSEC-2026-98
A vulnerability in NLTK versions up to and including 3.9.2 allows arbitrary file read via path traversal in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes fail to properly sanitize or validate file paths, enabling attac…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 nltk: PYSEC-2026-99
NLTK versions <=3.9.2 are vulnerable to arbitrary code execution due to improper input validation in the StanfordSegmenter module. The module dynamically loads external Java .jar files without verification or sandboxing. An attacker can supply or replace the JAR file, enabling the execution of arbi…
livebench/code_runner/requirements_eval.txt
high Security checks quality Quality conf 1.00 ✓ Repobility 24 occurrences Phantom test coverage: test_patch_run
Test function `test_patch_run` runs code but contains no assert / expect / should call — it passes regardless of behaviour. Adds line coverage without verifying anything.
12 files, 12 locations
livebench/agentic_code_runner/eval/harness/instance.py:56
livebench/agentic_code_runner/eval/harness/repos/c/OpenMathLib/OpenBLAS.py:229
livebench/agentic_code_runner/eval/harness/repos/c/facebook/zstd.py:230
livebench/agentic_code_runner/eval/harness/repos/c/fluent/fluentbit.py:282
livebench/agentic_code_runner/eval/harness/repos/c/jqlang/jq.py:237
livebench/agentic_code_runner/eval/harness/repos/c/libgit2/libgit2.py:402
livebench/agentic_code_runner/eval/harness/repos/c/libsdlorg/SDL.py:229
livebench/agentic_code_runner/eval/harness/repos/c/mruby/mruby.py:368
high Security checks quality Quality conf 1.00 ✓ Repobility Phantom test coverage: test_patch_run_log
Test function `test_patch_run_log` runs code but contains no assert / expect / should call — it passes regardless of behaviour. Adds line coverage without verifying anything.
livebench/agentic_code_runner/eval/harness/report.py:216
high Security checks software dependencies conf 0.88 pillow: GHSA-cfh3-3jmp-rvhc
Pillow affected by out-of-bounds write when loading PSD images
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 pillow: GHSA-pwv6-vv43-88gr
Pillow has an OOB Write with Invalid PSD Tile Extents (Integer Overflow)
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 pillow: GHSA-whj4-6x5x-4v2j
FITS GZIP decompression bomb in Pillow
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 pillow: PYSEC-2026-165
Pillow is a Python imaging library. Prior to version 12.2.0, if a font advances for each glyph by an exceeding large amount, when Pillow keeps track of the current position, it may lead to an integer overflow. This issue has been patched in version 12.2.0.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 pycryptodome: GHSA-j225-cvw7-qrx7
PyCryptodome and pycryptodomex side-channel leakage for OAEP decryption
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 scikit-learn: PYSEC-2024-110
A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words…
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 scipy: PYSEC-2023-102
A refcounting issue which leads to potential memory leak was discovered in scipy commit 8627df31ab in Py_FindObjects() function.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 scipy: PYSEC-2023-114
** DISPUTED ** A use-after-free issue was discovered in Py_FindObjects() function in SciPy versions prior to 1.8.0. NOTE: the vendor and discoverer indicate that this is not a security issue.
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-49rq-hwc3-x77w
TensorFlow has Null Pointer Error in QuantizedMatMulWithBiasAndDequantize
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-558h-mq8x-7q9g
TensorFlow has Null Pointer Error in SparseSparseMaximum
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-5w96-866f-6rm8
TensorFlow has Floating Point Exception in TFLite in conv kernel
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-647v-r7qq-24fh
TensorFlow has Floating Point Exception in TensorListSplit with XLA
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-64jg-wjww-7c5w
TensorFlow has Null Pointer Error in TensorArrayConcatV2
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-68v3-g9cm-rmm6
TensorFlow vulnerable to Out-of-Bounds Read in GRUBlockCellGrad
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-6hg6-5c2q-7rcr
TensorFlow has Heap-buffer-overflow in AvgPoolGrad
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-6wfh-89q8-44jq
TensorFlow has null dereference on ParallelConcat with XLA
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-7jvm-xxmr-v5cw
TensorFlow vulnerable to integer overflow in EditDistance
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-7x4v-9gxg-9hwj
TensorFlow has Segfault in Bincount with XLA
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-93vr-9q9m-pj8p
TensorFlow vulnerable to Out-of-Bounds Read in DynamicStitch
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-94mm-g2mv-8p7r
TensorFlow has Null Pointer Error in LookupTableImportV2
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-f49c-87jh-g47q
TensorFlow has double free in Fractional(Max/Avg)Pool
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-f637-vh3r-vfh2
TensorFlow has Floating Point Exception in AudioSpectrogram
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-gf97-q72m-7579
TensorFlow has Null Pointer Error in RandomShuffle with XLA enable
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-gjh7-xx4r-x345
TensorFlow has segfault in array_ops.upper_bound
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-j5w9-hmfh-4cr6
TensorFlow has segmentation fault in tfg-translate
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-qjqc-vqcf-5qvj
TensorFlow vulnerable to seg fault in `tf.raw_ops.Print`
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 tensorflow: GHSA-rcf8-g8jv-vg6p
TensorFlow has Floating Point Exception in AvgPoolGrad with XLA
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.88 werkzeug: GHSA-2g68-c3qc-8985
Werkzeug debugger vulnerable to remote execution when interacting with attacker controlled domain
livebench/code_runner/requirements_eval.txt
high System graph security security conf 1.00 Insecure pattern 'exec_used' in livebench/code_runner/eval/__init__.py:158
Found a known-risky pattern (exec_used). Review and replace if possible.
livebench/code_runner/eval/__init__.py:158 Exec used
low Security checks quality Error handling conf 1.00 [ERR001] Silent Exception Swallowing: Silently swallowing all exceptions hides bugs. Even in cleanup code, log at DEBUG level.
Log the error: `except Exception: logger.debug('cleanup failed', exc_info=True)`. Or handle specific exception types.
livebench/process_results/math/integrals_with_game/utils.py:122
low Security checks quality Error handling conf 1.00 [ERR001] Silent Exception Swallowing: Silently swallowing all exceptions hides bugs. Even in cleanup code, log at DEBUG level.
Log the error: `except Exception: logger.debug('cleanup failed', exc_info=True)`. Or handle specific exception types.
livebench/process_results/data_analysis/tablereformat/utils.py:15
low Security checks security Injection conf 0.50 3 occurrences [SEC005] Command Injection Risk: Unsafe shell execution or eval of user input.
Use subprocess with shell=False and a list of args. Never eval user input.
3 files, 3 locations
livebench/agentic_code_runner/minisweagent/environments/docker.py:106
livebench/agentic_code_runner/minisweagent/environments/local.py:23
livebench/code_runner/eval/utils.py:201
medium Security checks quality Quality conf 1.00 3 occurrences [SEC123] Production stack trace / debug output exposed: Debug mode left on in production exposes stack traces, environment variables, framework internals — sometimes triggers RCE (Django debug page with arbitrary template eval).
Set DEBUG=False / APP_DEBUG=false in production. Provide a generic 500 handler that logs to backend but returns a sanitized page to clients.
3 files, 3 locations
livebench/lcb_runner/evaluation/compute_code_generation_metrics.py:29
livebench/scripts/check_grading_flakiness.py:111
livebench/scripts/edit_questions.py:138
low Security checks quality Quality conf 1.00 [SEC136] AI-typical over-broad exception handler swallowing all errors: Catch-all exception block that silently returns success or no-ops. AI agents reach for this pattern when a flaky test or an unfamiliar API throws — wrap, swallow, return success. Real bugs are masked, observability is destroyed, and callers think the operation worked. CWE-396 (improperly-generalized exception). Distinct from intentional fallback because there's no log line and the success value is fabricated.
Catch the specific exception type, log at error level with full exception info, and return a failure-shaped result. If the operation is genuinely best-effort, log at warning and document why in a comment so the next reader (or scanner) knows.
livebench/scripts/check_grading_flakiness.py:43
low Security checks quality Error handling conf 0.55 ✓ Repobility 25 occurrences Broad exception handler needs review
This handler catches Exception/BaseException. It is actionable when it swallows errors without logging, re-raising, or returning a structured error. Handlers that intentionally convert exceptions into typed error results should not be treated as high risk.
12 files, 19 locations
livebench/scripts/inspect_agentic_traj.py:141, 144, 181 (3 hits)
livebench/code_runner/eval/__init__.py:182, 346 (2 hits)
livebench/model/completions.py:231, 524 (2 hits)
livebench/scripts/check_grading_flakiness.py:45, 56 (2 hits)
livebench/scripts/edit_questions.py:144, 184 (2 hits)
livebench/scripts/replay_agent_trajectory.py:79, 373 (2 hits)
livebench/agentic_code_runner/minisweagent/run_inference.py:233
livebench/code_runner/eval/utils.py:236
Error handlingquality
medium Security checks software dependencies conf 0.88 cryptography: GHSA-39hc-v87j-747x
Vulnerable OpenSSL included in cryptography wheels
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 cryptography: GHSA-9v9h-cgj8-h64p
Null pointer dereference in PKCS12 parsing
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 cryptography: GHSA-h4gh-qq45-vh27
pyca/cryptography has a vulnerable OpenSSL included in cryptography wheels
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 django: GHSA-rrqc-c2jx-6jgv
Django allows enumeration of user e-mail addresses
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 django: GHSA-vm8q-m57g-pff3
Regular expression denial-of-service in Django
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 keras: GHSA-mq84-hjqx-cwf2
Keras is vulnerable to arbitrary local file loading and Server-Side Request Forgery
livebench/code_runner/requirements_eval.txt
medium Security checks quality Quality conf 1.00 ✓ Repobility 3 occurrences Mutable default argument in `from_reports` (list)
`def from_reports(... = []/{}/set())` — Python's default value is constructed ONCE at function definition time and shared across all calls. Mutating it in one call mutates it for every future call too.
3 files, 3 locations
livebench/agentic_code_runner/eval/harness/report.py:303
livebench/lcb_runner/evaluation/compute_code_generation_metrics.py:157
livebench/lcb_runner/evaluation/pass_k_utils.py:26
medium Security checks software dependencies conf 0.88 nltk: GHSA-gfwx-w7gr-fvh7
Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in nltk
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 nltk: GHSA-rf74-v2fm-23pw
Natural Language Toolkit (NLTK) has unbounded recursion in JSONTaggedDecoder.decode_obj() may cause DoS
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 numpy: GHSA-fpfv-jqm9-f5jm
Incorrect Comparison in NumPy
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 pillow: GHSA-r73j-pqj5-w3x7
Pillow has a PDF Parsing Trailer Infinite Loop (DoS)
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 pytest: GHSA-6w46-j5rx-g56g
pytest has vulnerable tmpdir handling
livebench/code_runner/requirements_eval.txt
high Security checks software dependencies conf 0.70 4 occurrences Remote install command pipes network code directly to a shell
Agent helper projects often publish one-line installers. `curl | sh` style commands are convenient, but they bypass review unless the script is pinned, signed, or checksum-verified.
4 files, 4 locations
livebench/agentic_code_runner/eval/harness/repos/javascript/Automattic/mongoose.py:98
livebench/agentic_code_runner/eval/harness/repos/javascript/axios/axios.py:60
livebench/agentic_code_runner/eval/harness/repos/javascript/sveltejs/svelte.py:52
livebench/agentic_code_runner/eval/harness/repos/typescript/ant_design/ant_design.py:58
medium Security checks software dependencies conf 0.88 requests: GHSA-9hjg-9r4m-mvj7
Requests vulnerable to .netrc credentials leak via malicious URLs
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 requests: GHSA-9wx4-h78v-vm56
Requests `Session` object does not verify requests after making first request with verify=False
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 requests: GHSA-gc5v-m9x4-r6x2
Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.90 ✓ Repobility 4 occurrences requirements.txt: `absl-py` has no version pin
Unpinned pip requirement means every fresh install may resolve a different version. Newer releases can introduce malicious code (typosquats, account compromises). Reproducible installs need exact pins.
lines 1, 2, 3, 4
livebench/if_runner/instruction_following_eval/requirements.txt:1, 2, 3, 4 (4 hits)
medium Security checks software dependencies conf 0.88 tensorflow: GHSA-fqm2-gh8w-gr68
TensorFlow vulnerable to segfault when opening multiframe gif
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 tensorflow: GHSA-fxgc-95xx-grvq
TensorFlow Denial of Service vulnerability
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 werkzeug: GHSA-29vq-49wr-vm6x
Werkzeug safe_join() allows Windows special device names
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 werkzeug: GHSA-87hc-h4r5-73f7
Werkzeug safe_join() allows Windows special device names with compound extensions
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 werkzeug: GHSA-f9vj-2wh5-fj8j
Werkzeug safe_join not safe on Windows
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 werkzeug: GHSA-hgf8-39gv-g3f2
Werkzeug safe_join() allows Windows special device names
livebench/code_runner/requirements_eval.txt
medium Security checks software dependencies conf 0.88 werkzeug: GHSA-q34m-jh98-gwm2
Werkzeug possible resource exhaustion when parsing file data in forms
livebench/code_runner/requirements_eval.txt
medium System graph security security conf 1.00 Insecure pattern 'subprocess_shell_true' in livebench/agentic_code_runner/minisweagent/environments/docker.py:106
Found a known-risky pattern (subprocess_shell_true). Review and replace if possible.
livebench/agentic_code_runner/minisweagent/environments/docker.py:106 Subprocess shell true
medium System graph security security conf 1.00 Insecure pattern 'subprocess_shell_true' in livebench/agentic_code_runner/minisweagent/environments/extra/swerex_docker.py:33
Found a known-risky pattern (subprocess_shell_true). Review and replace if possible.
livebench/agentic_code_runner/minisweagent/environments/extra/swerex_docker.py:33 Subprocess shell true
medium System graph security security conf 1.00 Insecure pattern 'subprocess_shell_true' in livebench/agentic_code_runner/minisweagent/environments/local.py:25
Found a known-risky pattern (subprocess_shell_true). Review and replace if possible.
livebench/agentic_code_runner/minisweagent/environments/local.py:25 Subprocess shell true
medium System graph quality Integrity conf 1.00 Network/subprocess call without timeout or try/except — livebench/agentic_code_runner/eval/utils/git_util.py:33
`subprocess.run(...)` here lacks both a `timeout=` arg and an enclosing try/except. This is exactly the class of bug that took down our git-clone earlier (HTTP/2 stream cancel surfaced as a fatal). Add a `timeout=` and wrap in try/except, or use a wrapper that retries.
runtime safetyRobustness
medium System graph quality Integrity conf 1.00 Network/subprocess call without timeout or try/except — livebench/agentic_code_runner/minisweagent/environments/docker.py:106
`subprocess.Popen(...)` here lacks both a `timeout=` arg and an enclosing try/except. This is exactly the class of bug that took down our git-clone earlier (HTTP/2 stream cancel surfaced as a fatal). Add a `timeout=` and wrap in try/except, or use a wrapper that retries.
runtime safetyRobustness
medium System graph quality Integrity conf 1.00 Network/subprocess call without timeout or try/except — livebench/scripts/check_grading_flakiness.py:148
`subprocess.run(...)` here lacks both a `timeout=` arg and an enclosing try/except. This is exactly the class of bug that took down our git-clone earlier (HTTP/2 stream cancel surfaced as a fatal). Add a `timeout=` and wrap in try/except, or use a wrapper that retries.
runtime safetyRobustness
medium System graph quality Integrity conf 1.00 Network/subprocess call without timeout or try/except — livebench/scripts/check_question_variance.py:93
`subprocess.run(...)` here lacks both a `timeout=` arg and an enclosing try/except. This is exactly the class of bug that took down our git-clone earlier (HTTP/2 stream cancel surfaced as a fatal). Add a `timeout=` and wrap in try/except, or use a wrapper that retries.
runtime safetyRobustness
medium System graph cicd CI/CD security conf 1.00 No CI/CD pipelines detected
No GitHub Actions, GitLab CI, or CircleCI configs found. Without CI you can't gate deploys on tests/lints.
CI/CD securityCoverage
medium System graph quality Tests conf 1.00 Very low test-to-source ratio
8 test file(s) for 334 source file(s) (ratio 0.02). Consider adding integration or unit tests for critical paths.
Coverage
low Security checks software dependencies conf 0.88 cryptography: GHSA-5cpq-8wj7-hf2v
Vulnerable OpenSSL included in cryptography wheels
livebench/code_runner/requirements_eval.txt
low Security checks software dependencies conf 0.88 cryptography: GHSA-jm77-qphf-c4w8
pyca/cryptography's wheels include vulnerable OpenSSL
livebench/code_runner/requirements_eval.txt
low Security checks software dependencies conf 0.88 cryptography: GHSA-v8gr-m533-ghj9
Vulnerable OpenSSL included in cryptography wheels
livebench/code_runner/requirements_eval.txt
low Security checks software dependencies conf 0.88 django: GHSA-mjgh-79qc-68w3
Django has a Race Condition vulnerability
livebench/code_runner/requirements_eval.txt
low Security checks software dependencies conf 0.88 django: GHSA-q95w-c7qg-hrff
Django vulnerable to partial directory traversal via archives
livebench/code_runner/requirements_eval.txt
low Security checks quality Quality conf 0.60 30 occurrences Duplicated implementation block across source files
Duplicate implementation blocks are maintenance debt. Keep them visible, but they are not a high-severity defect unless the duplicated logic is security-sensitive or drifting.
12 files, 30 locations
livebench/agentic_code_runner/eval/harness/repos/c/valkey_io/valkey.py:7, 18, 25, 98, 175 (5 hits)
livebench/agentic_code_runner/eval/harness/repos/c/ponylang/ponyc.py:7, 18, 25, 342 (4 hits)
livebench/agentic_code_runner/eval/harness/repos/c/redis/redis.py:1, 18, 175, 211 (4 hits)
livebench/agentic_code_runner/eval/harness/repos/c/libgit2/libgit2.py:7, 18, 72 (3 hits)
livebench/agentic_code_runner/eval/harness/repos/c/mruby/mruby.py:7, 125, 174 (3 hits)
livebench/agentic_code_runner/eval/harness/repos/c/jqlang/jq.py:1, 18 (2 hits)
livebench/agentic_code_runner/eval/harness/repos/c/libsdlorg/SDL.py:7, 87 (2 hits)
livebench/agentic_code_runner/eval/harness/repos/c/php/phpsrc.py:7, 88 (2 hits)
duplicationquality
low Security checks software dependencies conf 0.88 flask: GHSA-68rp-wp8r-4726
Flask session does not add `Vary: Cookie` header when accessed in some ways
livebench/code_runner/requirements_eval.txt
low System graph quality Integrity conf 1.00 13 env vars used in code but missing from .env.example
Drift between code and config docs. The first few: `BIGCODEBENCH_TIMEOUT_PER_TASK`, `LITELLM_MODEL_REGISTRY_PATH`, `LIVEBENCH_API_KEY`, `MSWEA_CONFIG_DIR`, `MSWEA_DOCKER_EXECUTABLE`, `MSWEA_GLOBAL_CALL_LIMIT`, `MSWEA_GLOBAL_CONFIG_DIR`, `MSWEA_GLOBAL_COST_LIMIT` + 5 more. Add them (with a placehold…
config drift
low System graph software Dead code candidate conf 1.00 File has no detected symbols: livebench/agentic_code_runner/eval/harness/constant.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: livebench/download_leaderboard.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: livebench/download_questions.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: livebench/if_runner/instruction_following_eval/json_formatter.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph security security conf 1.00 Insecure pattern 'debug_true' in livebench/lcb_runner/evaluation/compute_code_generation_metrics.py:29
Found a known-risky pattern (debug_true). Review and replace if possible.
livebench/lcb_runner/evaluation/compute_code_generation_metrics.py:29 Debug true
low System graph security security conf 1.00 Insecure pattern 'debug_true' in livebench/scripts/check_grading_flakiness.py:111
Found a known-risky pattern (debug_true). Review and replace if possible.
livebench/scripts/check_grading_flakiness.py:111 Debug true
low System graph security security conf 1.00 Insecure pattern 'debug_true' in livebench/scripts/edit_questions.py:138
Found a known-risky pattern (debug_true). Review and replace if possible.
livebench/scripts/edit_questions.py:138 Debug true
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 10 places
Functions with the same first-5-line body hash: livebench/agentic_code_runner/eval/harness/run_evaluation.py:from_dict, livebench/agentic_code_runner/eval/harness/pull_request.py:from_dict, livebench/agentic_code_runner/eval/harness/pull_request.py:from_dict, livebench/agentic_code_runner/eval/harn…
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 14 places
Functions with the same first-5-line body hash: livebench/agentic_code_runner/eval/harness/run_evaluation.py:dict, livebench/agentic_code_runner/eval/harness/pull_request.py:dict, livebench/agentic_code_runner/eval/harness/pull_request.py:dict, livebench/agentic_code_runner/eval/harness/pull_reques…
duplicatesduplication
low System graph quality Integrity conf 1.00 12 occurrences Near-duplicate function bodies in 2 places
Functions with the same first-5-line body hash: livebench/code_runner/eval/__init__.py:trusted_check_exec, livebench/code_runner/eval/__init__.py:trusted_check This is *the* AI-coder failure mode (4× more duplication in vibe-coded repos — see https://jw.hn/ai-code-hygiene). Consolidate or document…
12 occurrences
repo-level (12 hits)
duplicatesduplication
low System graph quality Integrity conf 1.00 3 occurrences Near-duplicate function bodies in 3 places
Functions with the same first-5-line body hash: livebench/agentic_code_runner/eval/harness/run_evaluation.py:json, livebench/agentic_code_runner/eval/harness/gen_report.py:json, livebench/agentic_code_runner/eval/harness/build_dataset.py:json This is *the* AI-coder failure mode (4× more duplicatio…
3 occurrences
repo-level (3 hits)
duplicatesduplication
low System graph quality Integrity conf 1.00 2 occurrences Near-duplicate function bodies in 4 places
Functions with the same first-5-line body hash: livebench/agentic_code_runner/eval/harness/run_evaluation.py:run_mode_image, livebench/agentic_code_runner/eval/harness/run_evaluation.py:run, livebench/agentic_code_runner/eval/harness/build_dataset.py:run_mode_image, livebench/agentic_code_runner/ev…
2 occurrences
repo-level (2 hits)
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 9 places
Functions with the same first-5-line body hash: livebench/agentic_code_runner/eval/harness/run_evaluation.py:from_json, livebench/agentic_code_runner/eval/harness/pull_request.py:from_json, livebench/agentic_code_runner/eval/harness/pull_request.py:from_json, livebench/agentic_code_runner/eval/harn…
duplicatesduplication
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `connections_process_results_old` in livebench/process_results/writing/connections/utils.py:15
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `message_copy` in livebench/agentic_code_runner/minisweagent/models/litellm_model.py:420
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `read_df_func_v2` in livebench/process_results/data_analysis/tablereformat/utils.py:41
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `trajectory_copy` in livebench/scripts/replay_agent_trajectory.py:311
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `two_score_pattern_backup` in livebench/common.py:59
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `web_of_lies_v2` in livebench/gen_api_answer.py:342
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `web_of_lies_v2` in livebench/gen_ground_truth_judgment.py:22
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `zebra_puzzle_process_results_old` in livebench/process_results/reasoning/zebra_puzzle/utils.py:5
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph software Dead code conf 1.00 Possibly dead Python function: build_image
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/eval/harness/run_evaluation.py:566
low System graph software Dead code conf 1.00 Possibly dead Python function: compatible_eval_result
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/__init__.py:51
low System graph software Dead code conf 1.00 Possibly dead Python function: display_result_single
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/show_livebench_result.py:226
low System graph software Dead code conf 1.00 Possibly dead Python function: evaluate_files
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/__init__.py:254
low System graph software Dead code conf 1.00 Possibly dead Python function: inner_wrapper
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/eval/harness/instance.py:33
low System graph software Dead code conf 1.00 Possibly dead Python function: is_floats
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/__init__.py:101
low System graph software Dead code conf 1.00 Possibly dead Python function: load_single_model_judgments
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/common.py:404
low System graph software Dead code conf 1.00 Possibly dead Python function: normalize_game_key_dict
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/common.py:395
low System graph software Dead code conf 1.00 Possibly dead Python function: play_a_match_wrapper
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/gen_ground_truth_judgment.py:507
low System graph software Dead code conf 1.00 Possibly dead Python function: print_report
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/minisweagent/run/batch_progress.py:183
low System graph software Dead code conf 1.00 Possibly dead Python function: process_instance
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/minisweagent/run/run_batch.py:100
low System graph software Dead code conf 1.00 Possibly dead Python function: process_jsonl_file
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/scripts/syntax_error_finder.py:119
low System graph software Dead code conf 1.00 Possibly dead Python function: readable
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:268
low System graph software Dead code conf 1.00 Possibly dead Python function: remove_readonly
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/eval/utils/fs_utils.py:36
low System graph software Dead code conf 1.00 Possibly dead Python function: run_commands_for_model
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/scripts/rerun_failed_questions.py:77
low System graph software Dead code conf 1.00 Possibly dead Python function: run_instance
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/eval/harness/run_evaluation.py:725
low System graph software Dead code conf 1.00 Possibly dead Python function: run_iteration
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/scripts/check_question_variance.py:59
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_exec
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:204
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_killpg
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:146
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_os_popen
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:198
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_subprocess_call
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:158
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_subprocess_check_output
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:164
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_subprocess_run
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:170
low System graph software Dead code conf 1.00 Possibly dead Python function: safe_system
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/utils.py:152
low System graph software Dead code conf 1.00 Possibly dead Python function: trusted_check
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/__init__.py:351
low System graph software Dead code conf 1.00 Possibly dead Python function: trusted_check_exec
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/__init__.py:341
low System graph software Dead code conf 1.00 Possibly dead Python function: unsafe_execute
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/code_runner/eval/__init__.py:112
low System graph software Dead code conf 1.00 Possibly dead Python function: update_preds_file
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
livebench/agentic_code_runner/minisweagent/run/run_batch.py:75
low System graph quality Integrity conf 1.00 Stub function `get_instruction_args` (body is just `pass`/`return`) — livebench/if_runner/ifbench/instructions.py:227
Likely an AI scaffold that was never filled in. Remove or implement.
Empty handlerDead code
low System graph quality Integrity conf 1.00 Stub function `get_instruction_args` (body is just `pass`/`return`) — livebench/if_runner/instruction_following_eval/instructions.py:1301
Likely an AI scaffold that was never filled in. Remove or implement.
Empty handlerDead code
low System graph quality Complexity conf 1.00 Very large file: livebench/if_runner/ifbench/instructions.py (2252 lines)
Files with >800 lines often hide complexity hotspots and discourage tests.
low System graph quality Complexity conf 1.00 Very large file: livebench/if_runner/instruction_following_eval/instructions.py (1570 lines)
Files with >800 lines often hide complexity hotspots and discourage tests.
For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.
For AI agents + API integrations
Email me when this repo regresses
Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.
API access

This page is publicly accessible at: https://repobility.com/scan/285d8c54-1310-4654-8c87-9d14ef632d84/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/285d8c54-1310-4654-8c87-9d14ef632d84/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.