Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.
119 of your 142 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 2.68s · analysis 9.78s · 8.7 MB · GitHub API rate-limit (preflight)

skrub-data/skrub

https://github.com/skrub-data/skrub · scanned 2026-06-05 14:27 UTC (5 days, 4 hours ago) · 10 languages

387 raw signals (139 security + 248 graph) 80th percentile · Python · medium (20-100K LoC) System graph score 92 (lower by 13)

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 5 days, 4 hours ago · v2 · 161 actionable findings from 2 signal sources. 102 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON
Score breakdown â 2026-05-18-v5
Component Sub-score Weight Contribution
structure_score 60.0 0.15 9.00
security_score 95.2 0.25 23.80
testing_score 97.0 0.20 19.40
documentation_score 81.0 0.15 12.15
practices_score 70.0 0.15 10.50
code_quality 43.1 0.10 4.31
Overall 1.00 79.2
Severity distribution — click a segment to filter
Active filters: excluding tests × Reset all
Scan summary Quality grade B+ (79/100). Dimensions: security 95, maintainability 60. 139 findings (35 security). 58,205 lines analyzed.

Showing 79 of 161 actionable findings. 263 raw detector signals were grouped into reader-sized issues. Click TP / FP to vote on a finding's accuracy — votes adjust the confidence weighting and improve detection across the platform.

critical Security checks quality Quality conf 1.00 ✓ Repobility [MINED030] Python Pickle Loads: pickle.loads() can execute arbitrary code via __reduce__.
Review and fix per the pattern semantics. See CWE-502 / for context.
doc/tutorials/1110_data_ops_intro.py:173
critical Security checks quality Quality conf 1.00 [SEC081] Python: pickle.loads / marshal.loads on untrusted data: pickle.load(s) and marshal.load(s) execute arbitrary code on untrusted input. Ported from dlint DUO103 / DUO120 (BSD-3).
Use json, msgpack, or protobuf for untrusted data. If pickle is required, sign the payload with HMAC.
doc/tutorials/1110_data_ops_intro.py:173
high Security checks quality Quality conf 1.00 ✓ Repobility 4 occurrences Missing import: `string` used but not imported
The file uses `string.something(...)` but never imports `string`. This raises NameError at runtime the first time the line executes.
4 files, 4 locations
skrub/_fast_hash.py:81
skrub/_reporting/_serve.py:66
skrub/_reporting/_table_report.py:444
skrub/_string_distances.py:78
low Security checks quality Quality conf 1.00 ✓ Repobility [MINED012] Curl Pipe Bash: curl ... | sh / bash — runs unverified network code.
Review and fix per the pattern semantics. See CWE-494 / A08:2021 for context.
build_tools/circle/build_doc.sh:101
high Security checks quality Quality conf 1.00 ✓ Repobility 25 occurrences `self.fit_transform` used but never assigned in __init__
Method `fit` of class `ApplyToSubFrame` reads `self.fit_transform`, but no assignment to it exists in __init__ (and no class-level fallback). This raises AttributeError the first time the method runs against an instance.
2 files, 25 locations
skrub/_check_input.py:109, 114, 116, 120, 121, 122, 123, 124, +8 more (16 hits)
skrub/_apply_to_sub_frame.py:165, 188, 189, 196, 218, 219, 220 (9 hits)
medium Security checks cicd CI/CD security conf 0.90 ✓ Repobility 12 occurrences GitHub Action is tag-pinned rather than SHA-pinned
Action `prefix-dev/setup-pixi` pinned to mutable ref `@v0.9.6` uses a mutable tag or branch. Pin external actions to a reviewed full commit SHA when the workflow is security-sensitive.
5 files, 12 locations
.github/workflows/testing.yml:28, 40, 64 (4 hits)
.github/workflows/test-javascript.yml:16, 29 (3 hits)
.github/workflows/update_pixi_lock_files.yml:25, 38 (3 hits)
.github/workflows/check_stub_files_diff.yaml:18
.github/workflows/run-code-format-checks.yaml:18
CI/CD securitySupply chainGitHub Actions
low Security checks cicd CI/CD security conf 0.90 ✓ Repobility 10 occurrences GitHub Action is tag-pinned rather than SHA-pinned
Action `actions/checkout` pinned to mutable ref `@v6` uses a mutable tag or branch. Pin external actions to a reviewed full commit SHA when the workflow is security-sensitive.
7 files, 10 locations
.github/workflows/testing.yml:27, 50, 63 (3 hits)
.github/workflows/welcome_action.yaml:20 (2 hits)
.github/workflows/changelog.yml:21
.github/workflows/check_stub_files_diff.yaml:17
.github/workflows/run-code-format-checks.yaml:17
.github/workflows/test-javascript.yml:15
.github/workflows/update_pixi_lock_files.yml:24
CI/CD securitySupply chainGitHub Actions
high Security checks cicd CI/CD security conf 0.90 ✓ Repobility GitHub Action is tag-pinned rather than SHA-pinned
Action `larsoner/circleci-artifacts-redirector-action` pinned to mutable ref `@master` uses a mutable tag or branch. Pin external actions to a reviewed full commit SHA when the workflow is security-sensitive.
.github/workflows/main.yml:18 CI/CD securitySupply chainGitHub Actions
high Security checks software dependencies conf 0.90 ✓ Repobility 3 occurrences pre-commit hook `https://github.com/pre-commit/pre-commit-hooks` pinned to mutable rev `v6.0.0`
`.pre-commit-config.yaml` references `https://github.com/pre-commit/pre-commit-hooks` at `rev: v6.0.0`. If `{rev}` is a branch or version tag, the repo owner can push new code there and `pre-commit install --install-hooks` will fetch it on every developer's machine.
lines 2, 8, 15
.pre-commit-config.yaml:2, 8, 15 (3 hits)
high System graph cicd CI/CD security conf 1.00 GitHub Action tracks a moving branch
larsoner/circleci-artifacts-redirector-action@master can move without a code change in this repo. Pin third-party actions to a reviewed 40-character commit SHA.
.github/workflows/main.yml:18 CI/CD securitySupply chainGithub actions
high System graph security security conf 1.00 Insecure pattern 'eval_used' in skrub/_data_ops/_evaluation.py:368
Found a known-risky pattern (eval_used). Review and replace if possible.
skrub/_data_ops/_evaluation.py:368 Eval used
low Security checks security Deserialization conf 1.00 [SEC007] Unsafe Deserialization: Unsafe deserialization can execute arbitrary code.
Use yaml.safe_load() instead of yaml.load(). Avoid pickle for untrusted data.
skrub/_data_ops/_utils.py:56
low Security checks security Deserialization conf 1.00 [SEC007] Unsafe Deserialization: Unsafe deserialization can execute arbitrary code.
Use yaml.safe_load() instead of yaml.load(). Avoid pickle for untrusted data.
doc/tutorials/1110_data_ops_intro.py:173
low Security checks quality Error handling conf 0.55 ✓ Repobility 18 occurrences Broad exception handler needs review
This handler catches Exception/BaseException. It is actionable when it swallows errors without logging, re-raising, or returning a structured error. Handlers that intentionally convert exceptions into typed error results should not be treated as high risk.
9 files, 18 locations
skrub/_data_ops/_data_ops.py:250, 410, 745, 1386, 2017 (5 hits)
doc/sphinxext/github_link.py:54, 59, 67 (3 hits)
skrub/_data_ops/_inspection.py:70, 156 (2 hits)
skrub/_data_ops/_utils.py:85, 161 (2 hits)
skrub/_interpolation_joiner.py:426, 442 (2 hits)
skrub/_data_ops/_optuna.py:272
skrub/_single_column_transformer.py:345
skrub/datasets/_utils.py:351
Error handlingquality
medium Security checks quality Quality conf 1.00 ✓ Repobility Mutable default argument in `repr_args` (dict)
`def repr_args(... = []/{}/set())` — Python's default value is constructed ONCE at function definition time and shared across all calls. Mutating it in one call mutates it for every future call too.
skrub/_utils.py:194
high Security checks software dependencies conf 0.70 Remote install command pipes network code directly to a shell
Agent helper projects often publish one-line installers. `curl | sh` style commands are convenient, but they bypass review unless the script is pinned, signed, or checksum-verified.
build_tools/circle/build_doc.sh:101
medium Security checks software dependencies conf 0.90 ✓ Repobility 7 occurrences requirements.txt: `sphinx-gallery` has no version pin
Unpinned pip requirement means every fresh install may resolve a different version. Newer releases can introduce malicious code (typosquats, account compromises). Reproducible installs need exact pins.
lines 1, 2, 3, 4, 5, 6, 7
.binder/requirements.txt:1, 2, 3, 4, 5, 6, 7 (7 hits)
medium System graph security Coverage conf 1.00 No auth library detected
The scanner did not find any standard auth library (JWT, OAuth, NextAuth, Auth0, etc.). Either auth lives in custom code, in a separate service, or is missing.
auth
low Security checks quality Quality conf 0.60 9 occurrences Duplicated implementation block across source files
Duplicate implementation blocks are maintenance debt. Keep them visible, but they are not a high-severity defect unless the duplicated logic is security-sensitive or drifting.
8 files, 9 locations
skrub/_string_encoder.py:223, 225 (2 hits)
skrub/_apply_to_each_col.py:55
skrub/_apply_to_sub_frame.py:137
skrub/_datetime_encoder.py:404
skrub/_minhash_encoder.py:254
skrub/_single_column_transformer.py:37
skrub/_text_encoder.py:329
skrub/selectors/_selectors.py:30
duplicationquality
low System graph quality Maintenance conf 1.00 40 TODO/FIXME markers
High count of TODO/FIXME/HACK markers — track them as issues so they're not forgotten.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: build_tools/generate_data_ops_stub.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: doc/_static/scripts/sg_plotly_resize.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: doc/api_reference.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: doc/tutorials/0000_getting_started.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: doc/tutorials/1110_data_ops_intro.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0010_encodings.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0030_datetime_encoder.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0040_fuzzy_joining.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0050_deduplication.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0060_multiple_key_join.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0070_join_aggregation.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0080_interpolation_join.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0090_apply_to_cols.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/0100_squashing_scaler.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/data_ops/1120_multiple_tables.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/data_ops/1130_choices.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/data_ops/1140_subsampling.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: examples/FIXME/07_grid_searching_with_the_tablevectorizer.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress.config.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/e2e/column-filter.cy.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/e2e/column-summaries.cy.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/e2e/copybutton.cy.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/e2e/open-tab.cy.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/e2e/summary-statistics.cy.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/e2e/tabs.cy.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/support/commands.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/_reporting/js_tests/cypress/support/e2e.js
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: skrub/core.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph quality Integrity conf 1.00 9 occurrences Near-duplicate function bodies in 2 places
Functions with the same first-5-line body hash: skrub/_matching.py:fit, skrub/_matching.py:fit This is *the* AI-coder failure mode (4× more duplication in vibe-coded repos — see https://jw.hn/ai-code-hygiene). Consolidate or document why they're separate.
9 occurrences
repo-level (9 hits)
duplicatesduplication
low System graph quality Integrity conf 1.00 8 occurrences Near-duplicate function bodies in 3 places
Functions with the same first-5-line body hash: skrub/_apply_to_sub_frame.py:transform, skrub/_apply_to_cols.py:transform, skrub/_apply_to_each_col.py:transform This is *the* AI-coder failure mode (4× more duplication in vibe-coded repos — see https://jw.hn/ai-code-hygiene). Consolidate or documen…
8 occurrences
repo-level (8 hits)
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 4 places
Functions with the same first-5-line body hash: skrub/_agg_joiner.py:fit_transform, skrub/_agg_joiner.py:fit, skrub/_agg_joiner.py:fit_transform, skrub/_agg_joiner.py:fit This is *the* AI-coder failure mode (4× more duplication in vibe-coded repos — see https://jw.hn/ai-code-hygiene). Consolidate …
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 5 places
Functions with the same first-5-line body hash: skrub/_table_vectorizer.py:fit_transform, skrub/_table_vectorizer.py:fit_transform, skrub/_table_vectorizer.py:fit, skrub/_table_vectorizer.py:fit This is *the* AI-coder failure mode (4× more duplication in vibe-coded repos — see https://jw.hn/ai-cod…
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 6 places
Functions with the same first-5-line body hash: skrub/_apply_to_cols.py:get_feature_names_out, skrub/_select_cols.py:get_feature_names_out, skrub/_select_cols.py:get_feature_names_out, skrub/_apply_to_each_col.py:get_feature_names_out This is *the* AI-coder failure mode (4× more duplication in vib…
duplicatesduplication
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `days_old` in skrub/_data_ops/tests/test_utils.py:23
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `toxicity_v1` in skrub/datasets/_fetching.py:355
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph quality Integrity conf 1.00 Old/deprecated-named symbol `toxicity_v1` in skrub/datasets/_utils.py:109
Names with suffixes like `_old`, `_v1`, `_deprecated` usually indicate replaced-but-not-removed code (typical AI-coder leftover). Confirm and delete, or rename if it's the active version.
old markerDead code
low System graph software Dead code conf 1.00 Possibly dead Python function: call_garbage_collector
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
doc/conf.py:461
low System graph software Dead code conf 1.00 Possibly dead Python function: day_tick_formatter
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
doc/demo_periodic_features.py:76
low System graph software Dead code conf 1.00 3 occurrences Possibly dead Python function: decorator
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
3 files, 3 locations
skrub/_data_ops/_data_ops.py:392
skrub/_dispatch.py:235
skrub/_utils.py:299
low System graph software Dead code conf 1.00 Possibly dead Python function: default_score
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_estimator.py:1155
low System graph software Dead code conf 1.00 Possibly dead Python function: do_GET
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_reporting/_serve.py:39
low System graph software Dead code conf 1.00 Possibly dead Python function: half_hour_tick_formatter
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
doc/demo_periodic_features.py:69
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_choice_match
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_evaluation.py:1049
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_estimator
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_evaluation.py:252
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_mapping
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_evaluation.py:288
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_seq
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_evaluation.py:276
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_slice
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_evaluation.py:302
low System graph software Dead code conf 1.00 Possibly dead Python function: hour_tick_formatter
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
doc/demo_periodic_features.py:62
low System graph software Dead code conf 1.00 Possibly dead Python function: log_message
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_reporting/_serve.py:36
low System graph software Dead code conf 1.00 Possibly dead Python function: ngram_similarity
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_string_distances.py:95
low System graph software Dead code conf 1.00 Possibly dead Python function: objective
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_optuna.py:261
low System graph software Dead code conf 1.00 Possibly dead Python function: prune
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_data_ops/_evaluation.py:499
low System graph software Dead code conf 1.00 Possibly dead Python function: repr_dict
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
skrub/_utils.py:145
low System graph software Dead code conf 1.00 Possibly dead Python function: reset_skrub_config
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
doc/conf.py:456
low System graph frontend Frontend quality conf 1.00 Stray `console.log` in TS/JS — skrub/_reporting/_data/templates/data_ops/data_ops-report.js:32
Replace with the toast helper, an error boundary, or remove. `console.warn` / `console.error` are acceptable. Why: Hygiene — easy to leak debug output. Rule id: fq.console-leak
Fq console leak
low System graph quality Integrity conf 1.00 Stub function `_check_not_pandas_sparse` (body is just `pass`/`return`) — skrub/_check_input.py:49
Likely an AI scaffold that was never filled in. Remove or implement.
Empty handlerDead code
low System graph quality Integrity conf 1.00 Stub function `_null_value_for_polars` (body is just `pass`/`return`) — skrub/_dataframe/_common.py:324
Likely an AI scaffold that was never filled in. Remove or implement.
Empty handlerDead code
low System graph quality Complexity conf 1.00 Very large file: skrub/_data_ops/_data_ops.py (2135 lines)
Files with >800 lines often hide complexity hotspots and discourage tests.
low System graph quality Complexity conf 1.00 Very large file: skrub/_data_ops/_skrub_namespace.py (3725 lines)
Files with >800 lines often hide complexity hotspots and discourage tests.
For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.
For AI agents + API integrations
Email me when this repo regresses
Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.
API access

This page is publicly accessible at: https://repobility.com/scan/f8fbe2ac-1fee-44fb-921a-41af7da12550/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/f8fbe2ac-1fee-44fb-921a-41af7da12550/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.