Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.
119 of your 142 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 2.68s · analysis 9.78s · 8.7 MB · GitHub API rate-limit (preflight)

skrub-data/skrub

https://github.com/skrub-data/skrub · scanned 2026-06-05 14:27 UTC (5 days, 5 hours ago) · 10 languages

387 raw signals (139 security + 248 graph) 80th percentile · Python · medium (20-100K LoC) System graph score 92 (lower by 13)

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 5 days, 5 hours ago · v2 · 161 actionable findings from 2 signal sources. 102 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON
Score breakdown â 2026-05-18-v5
Component Sub-score Weight Contribution
structure_score 60.0 0.15 9.00
security_score 95.2 0.25 23.80
testing_score 97.0 0.20 19.40
documentation_score 81.0 0.15 12.15
practices_score 70.0 0.15 10.50
code_quality 43.1 0.10 4.31
Overall 1.00 79.2
Severity distribution — click a segment to filter
Active filters: excluding tests × Reset all

All 2610 nodes from the latest scan, grouped by kind. Each node is a unit the engine identified (file, function, endpoint, table…). Most users won't need this view — it's primarily for debugging the engine's graph extraction or for AI agents that want to enumerate the project structure.

LabelLayerStatusPath
cast_column_names_to_strings software healthy skrub/_check_input.py:15
_column_names_to_strings software healthy skrub/_check_input.py:21
_deduplicated_column_names software healthy skrub/_check_input.py:33
_cleaned_column_names software healthy skrub/_check_input.py:44
_check_not_pandas_sparse software healthy skrub/_check_input.py:49
_check_not_pandas_sparse_pandas software healthy skrub/_check_input.py:54
_check_is_dataframe software healthy skrub/_check_input.py:66
fit software healthy skrub/_check_input.py:108
fit_transform software healthy skrub/_check_input.py:112
transform software healthy skrub/_check_input.py:128
_handle_array software healthy skrub/_check_input.py:155
get_feature_names_out software healthy skrub/_check_input.py:171
__init__ software healthy skrub/_apply_to_sub_frame.py:133
fit software healthy skrub/_apply_to_sub_frame.py:145
fit_transform software healthy skrub/_apply_to_sub_frame.py:168
transform software healthy skrub/_apply_to_sub_frame.py:228
get_feature_names_out software healthy skrub/_apply_to_sub_frame.py:267
__init__ software healthy skrub/_matching.py:196
fit software healthy skrub/_matching.py:109
match software healthy skrub/_matching.py:34
_get_reference_distances software healthy skrub/_matching.py:200
_rescale_distances software healthy skrub/_matching.py:53
_sample_pairs software healthy skrub/_matching.py:65
_check_inputs software healthy skrub/_matching.py:128
__init__ software healthy skrub/_minhash_encoder.py:118
_get_murmur_hash software healthy skrub/_minhash_encoder.py:133
_get_fast_hash software healthy skrub/_minhash_encoder.py:161
_compute_hash_batched software healthy skrub/_minhash_encoder.py:191
fit software healthy skrub/_minhash_encoder.py:216
transform software healthy skrub/_minhash_encoder.py:257
get_feature_names_out software healthy skrub/_minhash_encoder.py:301
_example_data_dict software healthy skrub/conftest.py:13
_pl_from_dict software healthy skrub/conftest.py:119
all_dataframe_modules software healthy skrub/conftest.py:163
pd_module software healthy skrub/conftest.py:168
pl_module software healthy skrub/conftest.py:173
df_module software healthy skrub/conftest.py:181
example_data_dict software healthy skrub/conftest.py:230
use_fit_transform software healthy skrub/conftest.py:235
reset_config_to_base software healthy skrub/conftest.py:254
__init__ software healthy skrub/_apply_to_cols.py:294
fit software healthy skrub/_apply_to_cols.py:313
fit_transform software healthy skrub/_apply_to_cols.py:336
transform software healthy skrub/_apply_to_cols.py:396
get_feature_names_out software healthy skrub/_apply_to_cols.py:417
__getattr__ software healthy skrub/_apply_to_cols.py:434
__init__ software healthy skrub/_utils.py:21
__getitem__ software healthy skrub/_utils.py:25
__setitem__ software healthy skrub/_utils.py:33
__contains__ software healthy skrub/_utils.py:41

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

LabelLayerStatusPath
.pre-commit-config.yaml software healthy .pre-commit-config.yaml
pyproject.toml software healthy pyproject.toml
CODE_OF_CONDUCT.md software healthy CODE_OF_CONDUCT.md
codecov.yml software healthy codecov.yml
_check_input.py software healthy skrub/_check_input.py
_apply_to_sub_frame.py software healthy skrub/_apply_to_sub_frame.py
_matching.py software healthy skrub/_matching.py
_minhash_encoder.py software healthy skrub/_minhash_encoder.py
__init__.py software healthy skrub/__init__.py
conftest.py software healthy skrub/conftest.py
_apply_to_cols.py software healthy skrub/_apply_to_cols.py
_utils.py software healthy skrub/_utils.py
_string_encoder.py software healthy skrub/_string_encoder.py
_drop_uninformative.py software healthy skrub/_drop_uninformative.py
_tabular_pipeline.py software healthy skrub/_tabular_pipeline.py
_scaling_factor.py software healthy skrub/_scaling_factor.py
_single_column_transformer.py software healthy skrub/_single_column_transformer.py
_select_cols.py software healthy skrub/_select_cols.py
_fuzzy_join.py software healthy skrub/_fuzzy_join.py
_config.py software healthy skrub/_config.py
_multi_agg_joiner.py software healthy skrub/_multi_agg_joiner.py
_to_str.py software healthy skrub/_to_str.py
_to_datetime.py software healthy skrub/_to_datetime.py
_clean_categories.py software healthy skrub/_clean_categories.py
_table_vectorizer.py software healthy skrub/_table_vectorizer.py
_duration_to_float.py software healthy skrub/_duration_to_float.py
_similarity_encoder.py software healthy skrub/_similarity_encoder.py
_fast_hash.py software healthy skrub/_fast_hash.py
_apply_to_each_col.py software healthy skrub/_apply_to_each_col.py
_datetime_encoder.py software healthy skrub/_datetime_encoder.py
_deduplicate.py software healthy skrub/_deduplicate.py
_dispatch.py software healthy skrub/_dispatch.py
_string_distances.py software healthy skrub/_string_distances.py
core.py software warning skrub/core.py
_to_categorical.py software healthy skrub/_to_categorical.py
_clean_null_strings.py software healthy skrub/_clean_null_strings.py
_sklearn_compat.py software healthy skrub/_sklearn_compat.py
_gap_encoder.py software healthy skrub/_gap_encoder.py
_wrap_transformer.py software healthy skrub/_wrap_transformer.py
_column_associations.py software healthy skrub/_column_associations.py
_join_utils.py software healthy skrub/_join_utils.py
_to_float.py software healthy skrub/_to_float.py
_squashing_scaler.py software healthy skrub/_squashing_scaler.py
_agg_joiner.py software healthy skrub/_agg_joiner.py
_interpolation_joiner.py software healthy skrub/_interpolation_joiner.py
_text_encoder.py software healthy skrub/_text_encoder.py
_joiner.py software healthy skrub/_joiner.py
__init__.py software healthy skrub/datasets/__init__.py
_utils.py software healthy skrub/datasets/_utils.py
_fetching.py software healthy skrub/datasets/_fetching.py

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

LabelLayerStatusPath
CheckInputDataFrame software healthy skrub/_check_input.py:75
ApplyToSubFrame software healthy skrub/_apply_to_sub_frame.py:14
Matching software healthy skrub/_matching.py:8
RandomPairs software healthy skrub/_matching.py:78
SelfJoinNeighbor software healthy skrub/_matching.py:140
OtherNeighbor software healthy skrub/_matching.py:177
MinHashEncoder software healthy skrub/_minhash_encoder.py:23
ApplyToCols software healthy skrub/_apply_to_cols.py:16
LRUDict software healthy skrub/_utils.py:15
Repr software healthy skrub/_utils.py:142
_ShortRepr software healthy skrub/_utils.py:175
PassThrough software healthy skrub/_utils.py:261
StringEncoder software healthy skrub/_string_encoder.py:19
DropUninformative software healthy skrub/_drop_uninformative.py:12
RejectColumn software healthy skrub/_single_column_transformer.py:38
SingleColumnTransformer software healthy skrub/_single_column_transformer.py:123
SelectCols software healthy skrub/_select_cols.py:7
DropCols software healthy skrub/_select_cols.py:102
Drop software healthy skrub/_select_cols.py:199
MultiAggJoiner software healthy skrub/_multi_agg_joiner.py:20
ToStr software healthy skrub/_to_str.py:7
ToDatetime software healthy skrub/_to_datetime.py:85
CleanCategories software healthy skrub/_clean_categories.py:38
PassThrough software healthy skrub/_table_vectorizer.py:33
ShortReprDict software healthy skrub/_table_vectorizer.py:52
Cleaner software healthy skrub/_table_vectorizer.py:184
TableVectorizer software healthy skrub/_table_vectorizer.py:542
DurationToFloat software healthy skrub/_duration_to_float.py:25
SimilarityEncoder software healthy skrub/_similarity_encoder.py:133
ApplyToEachCol software healthy skrub/_apply_to_each_col.py:18
DatetimeEncoder software healthy skrub/_datetime_encoder.py:100
_BasePeriodicEncoder software healthy skrub/_datetime_encoder.py:522
_SplineEncoder software healthy skrub/_datetime_encoder.py:557
_CircularEncoder software healthy skrub/_datetime_encoder.py:647
DataFrameModuleInfo software healthy skrub/_dispatch.py:142
ToCategorical software healthy skrub/_to_categorical.py:7
CleanNullStrings software healthy skrub/_clean_null_strings.py:52
ParamsValidationMixin software healthy skrub/_sklearn_compat.py:30
InputTags software healthy skrub/_sklearn_compat.py:274
TargetTags software healthy skrub/_sklearn_compat.py:334
TransformerTags software healthy skrub/_sklearn_compat.py:372
ClassifierTags software healthy skrub/_sklearn_compat.py:391
RegressorTags software healthy skrub/_sklearn_compat.py:418
Tags software healthy skrub/_sklearn_compat.py:439
GapEncoder software healthy skrub/_gap_encoder.py:26
ToFloat software healthy skrub/_to_float.py:7
_MinMaxScaler software healthy skrub/_squashing_scaler.py:54
SquashingScaler software healthy skrub/_squashing_scaler.py:85
AggJoiner software healthy skrub/_agg_joiner.py:171
AggTarget software healthy skrub/_agg_joiner.py:396

Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.

LabelLayerStatusPath
skrub software healthy skrub
datasets software healthy skrub/datasets
tests software healthy skrub/datasets/tests
_reporting software healthy skrub/_reporting
tests software healthy skrub/_reporting/tests
js_tests software healthy skrub/_reporting/js_tests
cypress software healthy skrub/_reporting/js_tests/cypress
fixtures software healthy skrub/_reporting/js_tests/cypress/fixtures
support software healthy skrub/_reporting/js_tests/cypress/support
e2e software healthy skrub/_reporting/js_tests/cypress/e2e
_data software healthy skrub/_reporting/_data
templates software healthy skrub/_reporting/_data/templates
pure-3.0.0 software healthy skrub/_reporting/_data/templates/pure-3.0.0
icons software healthy skrub/_reporting/_data/templates/icons
data_ops software healthy skrub/_reporting/_data/templates/data_ops
tests software healthy skrub/tests
_data_ops software healthy skrub/_data_ops
tests software healthy skrub/_data_ops/tests
selectors software healthy skrub/selectors
tests software healthy skrub/selectors/tests
_dataframe software healthy skrub/_dataframe
tests software healthy skrub/_dataframe/tests
.circleci software healthy .circleci
build_tools software healthy build_tools
circle software healthy build_tools/circle
doc software healthy doc
binder software healthy doc/binder
sphinxext software healthy doc/sphinxext
_templates software healthy doc/_templates
_static software healthy doc/_static
scripts software healthy doc/_static/scripts
css software healthy doc/_static/css
tutorials software healthy doc/tutorials
examples software healthy examples
FIXME software healthy examples/FIXME
data_ops software healthy examples/data_ops
.github software healthy .github
ISSUE_TEMPLATE software healthy .github/ISSUE_TEMPLATE
PULL_REQUEST_TEMPLATE software healthy .github/PULL_REQUEST_TEMPLATE
workflows software healthy .github/workflows

LabelLayerStatusPath
test cicd healthy .github/workflows/testing.yml
check_run_nightly cicd healthy .github/workflows/testing.yml
test_against_nightly cicd healthy .github/workflows/testing.yml
welcome cicd healthy .github/workflows/welcome_action.yaml
check cicd healthy .github/workflows/changelog.yml
test cicd healthy .github/workflows/test-javascript.yml
circleci_artifacts_redirector_job cicd healthy .github/workflows/main.yml
run-pre-commit-checks cicd healthy .github/workflows/run-code-format-checks.yaml
update_lock_files cicd healthy .github/workflows/update_pixi_lock_files.yml
check-pyi-diff cicd healthy .github/workflows/check_stub_files_diff.yaml

LabelLayerStatusPath
gha::testing cicd healthy .github/workflows/testing.yml
gha::welcome_action cicd healthy .github/workflows/welcome_action.yaml
gha::changelog cicd healthy .github/workflows/changelog.yml
gha::test-javascript cicd healthy .github/workflows/test-javascript.yml
gha::main cicd healthy .github/workflows/main.yml
gha::run-code-format-checks cicd healthy .github/workflows/run-code-format-checks.yaml
gha::update_pixi_lock_files cicd healthy .github/workflows/update_pixi_lock_files.yml
gha::check_stub_files_diff cicd healthy .github/workflows/check_stub_files_diff.yaml
circleci cicd healthy .circleci/config.yml

LabelLayerStatusPath
BOT_GITHUB_TOKEN cicd healthy
CODECOV_TOKEN cicd healthy
CIRCLE_CI cicd healthy
GITHUB_TOKEN cicd healthy

LabelLayerStatusPath
repobility-clone-xj89mdeu software healthy /tmp/repobility-clone-xj89mdeu

LabelLayerStatusPath
sqlite data healthy skrub/_data_ops/_skrub_namespace.py

LabelLayerStatusPath
gpu (detected) hardware healthy skrub/_text_encoder.py

LabelLayerStatusPath
vps::aws hardware healthy skrub/_reporting/js_tests/package-lock.json
For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.
For AI agents + API integrations
Email me when this repo regresses
Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.
API access

This page is publicly accessible at: https://repobility.com/scan/f8fbe2ac-1fee-44fb-921a-41af7da12550/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/f8fbe2ac-1fee-44fb-921a-41af7da12550/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.