Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.

scikit-learn/scikit-learn

https://github.com/scikit-learn/scikit-learn · scanned 2026-05-15 09:54 UTC (2 weeks, 6 days ago) · 10 languages

548 findings (26 legacy + 522 scanner) 56th percentile · Python · large (100-500K LoC)

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 2 weeks, 6 days ago · v1 · 20 findings from 1 source. Findings combine the legacy security pipeline AND the multi-layer engine (atlas, wiring, flows, ranked) AND verified AI agent contributions.

JSON
Score breakdown â 2026-05-14-v3
Component Sub-score Weight Contribution
structure_score 60.0 0.15 9.00
security_score 90.0 0.25 22.50
testing_score 95.0 0.20 19.00
documentation_score 65.0 0.15 9.75
practices_score 65.0 0.15 9.75
code_quality 50.0 0.10 5.00
Overall 1.00 75.0
Severity distribution — click a segment to filter
Active filters: excluding tests × Reset all
Scan summary Repository scanned at 74.0/100 with 88.9% coverage. It contains 12288 nodes across 0 cross-layer flows, written primarily in mixed languages. Engine surfaced 0 findings. Risk profile is low: 0 critical, 0 high, 0 medium. Recommended next step: open the software layer findings first — that's where the highest-impact wins live.

Showing 19 of 20 findings. Click TP / FP to vote on a finding's accuracy — votes adjust the confidence weighting and improve detection across the platform.

low Legacy security deserialization conf 1.00 [SEC007] Unsafe Deserialization: Unsafe deserialization can execute arbitrary code.
Use yaml.safe_load() instead of yaml.load(). Avoid pickle for untrusted data.
sklearn/utils/estimator_checks.py:2713 deserializationlegacy
low Legacy security deserialization conf 1.00 [SEC007] Unsafe Deserialization: Unsafe deserialization can execute arbitrary code.
Use yaml.safe_load() instead of yaml.load(). Avoid pickle for untrusted data.
asv_benchmarks/benchmarks/common.py:176 deserializationlegacy
low Legacy security deserialization conf 1.00 [SEC007] Unsafe Deserialization: Unsafe deserialization can execute arbitrary code.
Use yaml.safe_load() instead of yaml.load(). Avoid pickle for untrusted data.
benchmarks/bench_plot_randomized_svd.py:137 deserializationlegacy
medium Legacy security path_traversal conf 1.00 [SEC012] ZipSlip — Archive Path Traversal: Archive extraction without path validation allows writing files outside the target directory.
Validate extracted paths with os.path.realpath() and ensure they stay within the target directory.
examples/applications/plot_out_of_core_classification.py:175 path_traversallegacy
medium Legacy security path_traversal conf 1.00 [SEC012] ZipSlip — Archive Path Traversal: Archive extraction without path validation allows writing files outside the target directory.
Validate extracted paths with os.path.realpath() and ensure they stay within the target directory.
sklearn/utils/fixes.py:348 path_traversallegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/externals/array_api_compat/numpy/_aliases.py:56 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/externals/array_api_compat/numpy/_aliases.py:52 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/externals/array_api_compat/dask/array/_info.py:48 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/externals/array_api_compat/dask/array/_aliases.py:106 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/ensemble/_voting.py:279 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/ensemble/_stacking.py:501 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/decomposition/_fastica.py:567 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/datasets/_rcv1.py:111 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/covariance/_shrunk_covariance.py:121 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/covariance/_shrunk_covariance.py:120 qualitylegacy
high Legacy quality quality conf 0.86 Duplicated implementation block across source files
Duplicated blocks are a common artifact when generated code is pasted or recreated instead of reused. They increase maintenance cost because every future bug fix must be found in multiple locations.
sklearn/covariance/_robust_covariance.py:541 qualitylegacy
medium Legacy quality quality conf 0.78 Suspicious implementation file appears unreferenced
A file created as a fixed/new/final/copy variant is not referenced by imports or path-like strings in the rest of the repository. This is a strong sign that an agent produced code beside the active application path.
maint_tools/sort_whats_new.py:1 qualitylegacy
low Legacy quality documentation No LICENSE file
Add a LICENSE file to your repository. Use choosealicense.com to pick the right license (MIT for permissive, Apache 2.0 for patent protection, GPL for copyleft).
documentationlegacy
high Legacy quality quality conf 0.62 Source file name looks like an AI patch artifact
Files named as final, fixed, copy, new, or backup are often temporary patch artifacts. They may be legitimate, but they deserve review before becoming production surface area.
maint_tools/sort_whats_new.py:1 qualitylegacy
For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.
For AI agents + API integrations
Email me when this repo regresses
Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.
API access

This page is publicly accessible at: https://repobility.com/scan/a5f73a3d-9c26-4983-8ec3-040adfc69698/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/a5f73a3d-9c26-4983-8ec3-040adfc69698/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.