Scan timing: clone 2.68s · analysis 9.78s · 8.7 MB · GitHub API rate-limit (preflight)
https://github.com/skrub-data/skrub
· scanned 2026-06-05 14:27 UTC (5 days, 5 hours ago)
· 10 languages
387 raw signals (139 security + 248 graph) 80th percentile · Python · medium (20-100K LoC) System graph score 92 (lower by 13)
Last scanned 5 days, 5 hours ago · v2 · 161 actionable findings from 2 signal sources. 102 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.
| Component | Sub-score | Weight | Contribution |
|---|---|---|---|
structure_score |
60.0 | 0.15 | 9.00 |
security_score |
95.2 | 0.25 | 23.80 |
testing_score |
97.0 | 0.20 | 19.40 |
documentation_score |
81.0 | 0.15 | 12.15 |
practices_score |
70.0 | 0.15 | 10.50 |
code_quality |
43.1 | 0.10 | 4.31 |
| Overall | 1.00 | 79.2 |
All 2610 nodes from the latest scan, grouped by kind. Each node is a unit the engine identified (file, function, endpoint, table…). Most users won't need this view — it's primarily for debugging the engine's graph extraction or for AI agents that want to enumerate the project structure.
| Label | Layer | Status | Path |
|---|---|---|---|
cast_column_names_to_strings |
software | healthy | skrub/_check_input.py:15 |
_column_names_to_strings |
software | healthy | skrub/_check_input.py:21 |
_deduplicated_column_names |
software | healthy | skrub/_check_input.py:33 |
_cleaned_column_names |
software | healthy | skrub/_check_input.py:44 |
_check_not_pandas_sparse |
software | healthy | skrub/_check_input.py:49 |
_check_not_pandas_sparse_pandas |
software | healthy | skrub/_check_input.py:54 |
_check_is_dataframe |
software | healthy | skrub/_check_input.py:66 |
fit |
software | healthy | skrub/_check_input.py:108 |
fit_transform |
software | healthy | skrub/_check_input.py:112 |
transform |
software | healthy | skrub/_check_input.py:128 |
_handle_array |
software | healthy | skrub/_check_input.py:155 |
get_feature_names_out |
software | healthy | skrub/_check_input.py:171 |
__init__ |
software | healthy | skrub/_apply_to_sub_frame.py:133 |
fit |
software | healthy | skrub/_apply_to_sub_frame.py:145 |
fit_transform |
software | healthy | skrub/_apply_to_sub_frame.py:168 |
transform |
software | healthy | skrub/_apply_to_sub_frame.py:228 |
get_feature_names_out |
software | healthy | skrub/_apply_to_sub_frame.py:267 |
__init__ |
software | healthy | skrub/_matching.py:196 |
fit |
software | healthy | skrub/_matching.py:109 |
match |
software | healthy | skrub/_matching.py:34 |
_get_reference_distances |
software | healthy | skrub/_matching.py:200 |
_rescale_distances |
software | healthy | skrub/_matching.py:53 |
_sample_pairs |
software | healthy | skrub/_matching.py:65 |
_check_inputs |
software | healthy | skrub/_matching.py:128 |
__init__ |
software | healthy | skrub/_minhash_encoder.py:118 |
_get_murmur_hash |
software | healthy | skrub/_minhash_encoder.py:133 |
_get_fast_hash |
software | healthy | skrub/_minhash_encoder.py:161 |
_compute_hash_batched |
software | healthy | skrub/_minhash_encoder.py:191 |
fit |
software | healthy | skrub/_minhash_encoder.py:216 |
transform |
software | healthy | skrub/_minhash_encoder.py:257 |
get_feature_names_out |
software | healthy | skrub/_minhash_encoder.py:301 |
_example_data_dict |
software | healthy | skrub/conftest.py:13 |
_pl_from_dict |
software | healthy | skrub/conftest.py:119 |
all_dataframe_modules |
software | healthy | skrub/conftest.py:163 |
pd_module |
software | healthy | skrub/conftest.py:168 |
pl_module |
software | healthy | skrub/conftest.py:173 |
df_module |
software | healthy | skrub/conftest.py:181 |
example_data_dict |
software | healthy | skrub/conftest.py:230 |
use_fit_transform |
software | healthy | skrub/conftest.py:235 |
reset_config_to_base |
software | healthy | skrub/conftest.py:254 |
__init__ |
software | healthy | skrub/_apply_to_cols.py:294 |
fit |
software | healthy | skrub/_apply_to_cols.py:313 |
fit_transform |
software | healthy | skrub/_apply_to_cols.py:336 |
transform |
software | healthy | skrub/_apply_to_cols.py:396 |
get_feature_names_out |
software | healthy | skrub/_apply_to_cols.py:417 |
__getattr__ |
software | healthy | skrub/_apply_to_cols.py:434 |
__init__ |
software | healthy | skrub/_utils.py:21 |
__getitem__ |
software | healthy | skrub/_utils.py:25 |
__setitem__ |
software | healthy | skrub/_utils.py:33 |
__contains__ |
software | healthy | skrub/_utils.py:41 |
Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.
| Label | Layer | Status | Path |
|---|---|---|---|
.pre-commit-config.yaml |
software | healthy | .pre-commit-config.yaml |
pyproject.toml |
software | healthy | pyproject.toml |
CODE_OF_CONDUCT.md |
software | healthy | CODE_OF_CONDUCT.md |
codecov.yml |
software | healthy | codecov.yml |
_check_input.py |
software | healthy | skrub/_check_input.py |
_apply_to_sub_frame.py |
software | healthy | skrub/_apply_to_sub_frame.py |
_matching.py |
software | healthy | skrub/_matching.py |
_minhash_encoder.py |
software | healthy | skrub/_minhash_encoder.py |
__init__.py |
software | healthy | skrub/__init__.py |
conftest.py |
software | healthy | skrub/conftest.py |
_apply_to_cols.py |
software | healthy | skrub/_apply_to_cols.py |
_utils.py |
software | healthy | skrub/_utils.py |
_string_encoder.py |
software | healthy | skrub/_string_encoder.py |
_drop_uninformative.py |
software | healthy | skrub/_drop_uninformative.py |
_tabular_pipeline.py |
software | healthy | skrub/_tabular_pipeline.py |
_scaling_factor.py |
software | healthy | skrub/_scaling_factor.py |
_single_column_transformer.py |
software | healthy | skrub/_single_column_transformer.py |
_select_cols.py |
software | healthy | skrub/_select_cols.py |
_fuzzy_join.py |
software | healthy | skrub/_fuzzy_join.py |
_config.py |
software | healthy | skrub/_config.py |
_multi_agg_joiner.py |
software | healthy | skrub/_multi_agg_joiner.py |
_to_str.py |
software | healthy | skrub/_to_str.py |
_to_datetime.py |
software | healthy | skrub/_to_datetime.py |
_clean_categories.py |
software | healthy | skrub/_clean_categories.py |
_table_vectorizer.py |
software | healthy | skrub/_table_vectorizer.py |
_duration_to_float.py |
software | healthy | skrub/_duration_to_float.py |
_similarity_encoder.py |
software | healthy | skrub/_similarity_encoder.py |
_fast_hash.py |
software | healthy | skrub/_fast_hash.py |
_apply_to_each_col.py |
software | healthy | skrub/_apply_to_each_col.py |
_datetime_encoder.py |
software | healthy | skrub/_datetime_encoder.py |
_deduplicate.py |
software | healthy | skrub/_deduplicate.py |
_dispatch.py |
software | healthy | skrub/_dispatch.py |
_string_distances.py |
software | healthy | skrub/_string_distances.py |
core.py |
software | warning | skrub/core.py |
_to_categorical.py |
software | healthy | skrub/_to_categorical.py |
_clean_null_strings.py |
software | healthy | skrub/_clean_null_strings.py |
_sklearn_compat.py |
software | healthy | skrub/_sklearn_compat.py |
_gap_encoder.py |
software | healthy | skrub/_gap_encoder.py |
_wrap_transformer.py |
software | healthy | skrub/_wrap_transformer.py |
_column_associations.py |
software | healthy | skrub/_column_associations.py |
_join_utils.py |
software | healthy | skrub/_join_utils.py |
_to_float.py |
software | healthy | skrub/_to_float.py |
_squashing_scaler.py |
software | healthy | skrub/_squashing_scaler.py |
_agg_joiner.py |
software | healthy | skrub/_agg_joiner.py |
_interpolation_joiner.py |
software | healthy | skrub/_interpolation_joiner.py |
_text_encoder.py |
software | healthy | skrub/_text_encoder.py |
_joiner.py |
software | healthy | skrub/_joiner.py |
__init__.py |
software | healthy | skrub/datasets/__init__.py |
_utils.py |
software | healthy | skrub/datasets/_utils.py |
_fetching.py |
software | healthy | skrub/datasets/_fetching.py |
Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.
| Label | Layer | Status | Path |
|---|---|---|---|
CheckInputDataFrame |
software | healthy | skrub/_check_input.py:75 |
ApplyToSubFrame |
software | healthy | skrub/_apply_to_sub_frame.py:14 |
Matching |
software | healthy | skrub/_matching.py:8 |
RandomPairs |
software | healthy | skrub/_matching.py:78 |
SelfJoinNeighbor |
software | healthy | skrub/_matching.py:140 |
OtherNeighbor |
software | healthy | skrub/_matching.py:177 |
MinHashEncoder |
software | healthy | skrub/_minhash_encoder.py:23 |
ApplyToCols |
software | healthy | skrub/_apply_to_cols.py:16 |
LRUDict |
software | healthy | skrub/_utils.py:15 |
Repr |
software | healthy | skrub/_utils.py:142 |
_ShortRepr |
software | healthy | skrub/_utils.py:175 |
PassThrough |
software | healthy | skrub/_utils.py:261 |
StringEncoder |
software | healthy | skrub/_string_encoder.py:19 |
DropUninformative |
software | healthy | skrub/_drop_uninformative.py:12 |
RejectColumn |
software | healthy | skrub/_single_column_transformer.py:38 |
SingleColumnTransformer |
software | healthy | skrub/_single_column_transformer.py:123 |
SelectCols |
software | healthy | skrub/_select_cols.py:7 |
DropCols |
software | healthy | skrub/_select_cols.py:102 |
Drop |
software | healthy | skrub/_select_cols.py:199 |
MultiAggJoiner |
software | healthy | skrub/_multi_agg_joiner.py:20 |
ToStr |
software | healthy | skrub/_to_str.py:7 |
ToDatetime |
software | healthy | skrub/_to_datetime.py:85 |
CleanCategories |
software | healthy | skrub/_clean_categories.py:38 |
PassThrough |
software | healthy | skrub/_table_vectorizer.py:33 |
ShortReprDict |
software | healthy | skrub/_table_vectorizer.py:52 |
Cleaner |
software | healthy | skrub/_table_vectorizer.py:184 |
TableVectorizer |
software | healthy | skrub/_table_vectorizer.py:542 |
DurationToFloat |
software | healthy | skrub/_duration_to_float.py:25 |
SimilarityEncoder |
software | healthy | skrub/_similarity_encoder.py:133 |
ApplyToEachCol |
software | healthy | skrub/_apply_to_each_col.py:18 |
DatetimeEncoder |
software | healthy | skrub/_datetime_encoder.py:100 |
_BasePeriodicEncoder |
software | healthy | skrub/_datetime_encoder.py:522 |
_SplineEncoder |
software | healthy | skrub/_datetime_encoder.py:557 |
_CircularEncoder |
software | healthy | skrub/_datetime_encoder.py:647 |
DataFrameModuleInfo |
software | healthy | skrub/_dispatch.py:142 |
ToCategorical |
software | healthy | skrub/_to_categorical.py:7 |
CleanNullStrings |
software | healthy | skrub/_clean_null_strings.py:52 |
ParamsValidationMixin |
software | healthy | skrub/_sklearn_compat.py:30 |
InputTags |
software | healthy | skrub/_sklearn_compat.py:274 |
TargetTags |
software | healthy | skrub/_sklearn_compat.py:334 |
TransformerTags |
software | healthy | skrub/_sklearn_compat.py:372 |
ClassifierTags |
software | healthy | skrub/_sklearn_compat.py:391 |
RegressorTags |
software | healthy | skrub/_sklearn_compat.py:418 |
Tags |
software | healthy | skrub/_sklearn_compat.py:439 |
GapEncoder |
software | healthy | skrub/_gap_encoder.py:26 |
ToFloat |
software | healthy | skrub/_to_float.py:7 |
_MinMaxScaler |
software | healthy | skrub/_squashing_scaler.py:54 |
SquashingScaler |
software | healthy | skrub/_squashing_scaler.py:85 |
AggJoiner |
software | healthy | skrub/_agg_joiner.py:171 |
AggTarget |
software | healthy | skrub/_agg_joiner.py:396 |
Showing first 50 of this kind. Full payload available via the JSON button at the top of the page.
| Label | Layer | Status | Path |
|---|---|---|---|
skrub |
software | healthy | skrub |
datasets |
software | healthy | skrub/datasets |
tests |
software | healthy | skrub/datasets/tests |
_reporting |
software | healthy | skrub/_reporting |
tests |
software | healthy | skrub/_reporting/tests |
js_tests |
software | healthy | skrub/_reporting/js_tests |
cypress |
software | healthy | skrub/_reporting/js_tests/cypress |
fixtures |
software | healthy | skrub/_reporting/js_tests/cypress/fixtures |
support |
software | healthy | skrub/_reporting/js_tests/cypress/support |
e2e |
software | healthy | skrub/_reporting/js_tests/cypress/e2e |
_data |
software | healthy | skrub/_reporting/_data |
templates |
software | healthy | skrub/_reporting/_data/templates |
pure-3.0.0 |
software | healthy | skrub/_reporting/_data/templates/pure-3.0.0 |
icons |
software | healthy | skrub/_reporting/_data/templates/icons |
data_ops |
software | healthy | skrub/_reporting/_data/templates/data_ops |
tests |
software | healthy | skrub/tests |
_data_ops |
software | healthy | skrub/_data_ops |
tests |
software | healthy | skrub/_data_ops/tests |
selectors |
software | healthy | skrub/selectors |
tests |
software | healthy | skrub/selectors/tests |
_dataframe |
software | healthy | skrub/_dataframe |
tests |
software | healthy | skrub/_dataframe/tests |
.circleci |
software | healthy | .circleci |
build_tools |
software | healthy | build_tools |
circle |
software | healthy | build_tools/circle |
doc |
software | healthy | doc |
binder |
software | healthy | doc/binder |
sphinxext |
software | healthy | doc/sphinxext |
_templates |
software | healthy | doc/_templates |
_static |
software | healthy | doc/_static |
scripts |
software | healthy | doc/_static/scripts |
css |
software | healthy | doc/_static/css |
tutorials |
software | healthy | doc/tutorials |
examples |
software | healthy | examples |
FIXME |
software | healthy | examples/FIXME |
data_ops |
software | healthy | examples/data_ops |
.github |
software | healthy | .github |
ISSUE_TEMPLATE |
software | healthy | .github/ISSUE_TEMPLATE |
PULL_REQUEST_TEMPLATE |
software | healthy | .github/PULL_REQUEST_TEMPLATE |
workflows |
software | healthy | .github/workflows |
| Label | Layer | Status | Path |
|---|---|---|---|
test |
cicd | healthy | .github/workflows/testing.yml |
check_run_nightly |
cicd | healthy | .github/workflows/testing.yml |
test_against_nightly |
cicd | healthy | .github/workflows/testing.yml |
welcome |
cicd | healthy | .github/workflows/welcome_action.yaml |
check |
cicd | healthy | .github/workflows/changelog.yml |
test |
cicd | healthy | .github/workflows/test-javascript.yml |
circleci_artifacts_redirector_job |
cicd | healthy | .github/workflows/main.yml |
run-pre-commit-checks |
cicd | healthy | .github/workflows/run-code-format-checks.yaml |
update_lock_files |
cicd | healthy | .github/workflows/update_pixi_lock_files.yml |
check-pyi-diff |
cicd | healthy | .github/workflows/check_stub_files_diff.yaml |
| Label | Layer | Status | Path |
|---|---|---|---|
gha::testing |
cicd | healthy | .github/workflows/testing.yml |
gha::welcome_action |
cicd | healthy | .github/workflows/welcome_action.yaml |
gha::changelog |
cicd | healthy | .github/workflows/changelog.yml |
gha::test-javascript |
cicd | healthy | .github/workflows/test-javascript.yml |
gha::main |
cicd | healthy | .github/workflows/main.yml |
gha::run-code-format-checks |
cicd | healthy | .github/workflows/run-code-format-checks.yaml |
gha::update_pixi_lock_files |
cicd | healthy | .github/workflows/update_pixi_lock_files.yml |
gha::check_stub_files_diff |
cicd | healthy | .github/workflows/check_stub_files_diff.yaml |
circleci |
cicd | healthy | .circleci/config.yml |
| Label | Layer | Status | Path |
|---|---|---|---|
BOT_GITHUB_TOKEN |
cicd | healthy | — |
CODECOV_TOKEN |
cicd | healthy | — |
CIRCLE_CI |
cicd | healthy | — |
GITHUB_TOKEN |
cicd | healthy | — |
| Label | Layer | Status | Path |
|---|---|---|---|
repobility-clone-xj89mdeu |
software | healthy | /tmp/repobility-clone-xj89mdeu |
| Label | Layer | Status | Path |
|---|---|---|---|
sqlite |
data | healthy | skrub/_data_ops/_skrub_namespace.py |
| Label | Layer | Status | Path |
|---|---|---|---|
gpu (detected) |
hardware | healthy | skrub/_text_encoder.py |
| Label | Layer | Status | Path |
|---|---|---|---|
vps::aws |
hardware | healthy | skrub/_reporting/js_tests/package-lock.json |
This page is publicly accessible at:
https://repobility.com/scan/f8fbe2ac-1fee-44fb-921a-41af7da12550/
To check status programmatically (no auth required):
curl -s https://repobility.com/api/v1/public/scan/f8fbe2ac-1fee-44fb-921a-41af7da12550/
Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.