apache/arrow

Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.

263 of your 585 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 16.91s · analysis 40.92s · 63.7 MB · GitHub API rate-limit (preflight)

https://github.com/apache/arrow · scanned 2026-06-05 21:28 UTC (4 days, 12 hours ago) · 10 languages

1477 raw signals (493 security + 984 graph) 11/13 scanners ran 92nd percentile · C · huge (>500K LoC) System graph score 64 (higher by 14)

File as issue Image

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 4 days, 12 hours ago · v2 · 462 actionable findings from 2 signal sources. 522 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON

100.0% cov · 361 findings

Score breakdown â 2026-05-18-v5

Component	Sub-score	Weight	Contribution
`structure_score`	40.0	0.15	6.00
`security_score`	85.0	0.25	21.25
`testing_score`	87.0	0.20	17.40
`documentation_score`	99.0	0.15	14.85
`practices_score`	86.0	0.15	12.90
`code_quality`	60.0	0.10	6.00
Overall		1.00	78.4

security_score may be inflated — optional security scanners were skipped on this fast scan

Severity distribution — click a segment to filter

Active filters: excluding tests × Reset all

Severity: Critical 9 High 19 Medium 48 Low 111 9-Layer: Software Security Quality Integrity Frontend Hardware Data Network Cicd Source: Security checks 101 System graph 361 Crowd 0 Layer: Software 53 Quality 338 Cicd 18 Security 13 Frontend 2 Network 3 Hardware 35

Exclude dismissed / FP Exclude test files

Bug-class explainers. Each card groups findings of the same shape — these are the patterns most likely to ship to prod and reappear in future scans unless you systematically fix the cause, not just the instance.

Duplicates & near-duplicates 6 findings

What it is: Same function copy-pasted into multiple modules with minor variations.

Why it matters: Each copy drifts independently — bug fixes apply to one, miss the others.

How AI causes it: AI completes the same pattern in each file rather than refactoring to a shared helper.

Fix approach: Extract the duplicated logic into the most general module both call sites already import. Add tests at the helper level.

6 matching findings on this repo

low Near-duplicate function bodies in 2 places repo-level
low Near-duplicate function bodies in 3 places repo-level
low Near-duplicate function bodies in 6 places
low Near-duplicate function bodies in 22 places
low Near-duplicate function bodies in 12 places
low Near-duplicate function bodies in 7 places

View all duplicates & near-duplicates findings →

Commented-out code 215 findings

What it is: Lines of source that were intentionally disabled but never deleted.

Why it matters: Git already remembers history — commented code rots, becomes wrong, and adds noise to diffs.

How AI causes it: AI sometimes comments out broken code instead of fixing it. Reviewers approve out of inertia.

Fix approach: Delete. Trust `git log`. If you really need to remember, save it in a notes file under `docs/`.

12 matching findings on this repo

info Commented-code block (7 lines) in cmake-format.py:1
info Commented-code block (7 lines) in docs/source/conf.py:3
info Commented-code block (7 lines) in docs/source/_static/versionwarning.js:1
info Commented-code block (7 lines) in docs/source/python/conftest.py:1
info Commented-code block (7 lines) in r/inst/demo_flight_server.py:1
info Commented-code block (6 lines) in dev/test_merge_arrow_pr.py:4
info Commented-code block (6 lines) in dev/merge_arrow_pr.py:3
info Commented-code block (7 lines) in dev/release/check-rat-report.py:3
info Commented-code block (6 lines) in dev/release/download_rc_binaries.py:3
info Commented-code block (7 lines) in dev/release/utils-update-docs-versions.py:1
info Commented-code block (7 lines) in dev/archery/conftest.py:1
info Commented-code block (7 lines) in dev/archery/setup.py:2

View all commented-out code findings →

Config drift 3 findings

What it is: Settings duplicated across env files, Docker compose, K8s, and code defaults, all with slightly different values.

Why it matters: Production behaviour depends on whichever copy your loader reads first. Subtle bugs in staging that don't reproduce in dev.

How AI causes it: AI writes new config from memory rather than reading the existing source.

Fix approach: Pick one source of truth (env vars + a settings module). Have every other place import from there. Lint for duplicates in CI.

3 matching findings on this repo

high [MINED131] pre-commit hook `https://github.com/hadolint/hadolint` pinned to mut… .pre-commit-config.yaml:38
high [MINED106] Phantom test coverage: test_config: Test function `test_config` runs… dev/archery/archery/crossbow/tests/test_core.py:25
high [MINED106] Phantom test coverage: test_config_validation: Test function `test_c… dev/archery/archery/docker/tests/test_docker.py:229

View all config drift findings →

For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.

For AI agents + API integrations

Email me when this repo regresses

Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.

API access

This page is publicly accessible at: https://repobility.com/scan/22ac6ece-0010-4fbd-8208-25335a665c2d/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/22ac6ece-0010-4fbd-8208-25335a665c2d/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.