Public scan — anyone with this URL can view this analysis. Sign up to track your own repos privately, run scheduled re-scans, and get AI fix prompts via your dashboard.
75 of your 119 findings came from Repobility's proprietary detections. ✓ Repobility tags below mark them.

Scan timing: clone 2.74s · analysis 1.33s · 24.8 MB · GitHub API rate-limit (preflight)

microsoft/markitdown

https://github.com/microsoft/markitdown.git · scanned 2026-05-27 03:30 UTC (2 weeks, 3 days ago) · 10 languages

278 raw signals (115 security + 163 graph) 67th percentile · Python · small (2-20K LoC) System graph score 89 (lower by 13)

UNIFIED Repobility · multi-layer engine · AI coders

Complete repo analysis

Last scanned 2 weeks, 3 days ago · v3 · 82 actionable findings from 2 signal sources. 82 repeated signals grouped for readability. Security checks, system graph analysis, and verified AI-agent feedback are merged into one review queue.

JSON
Score breakdown â 2026-05-18-v5
Component Sub-score Weight Contribution
structure_score 55.0 0.15 8.25
security_score 94.7 0.25 23.68
testing_score 85.0 0.20 17.00
documentation_score 80.0 0.15 12.00
practices_score 75.0 0.15 11.25
code_quality 43.6 0.10 4.36
Overall 1.00 76.5
Severity distribution — click a segment to filter
Active filters: excluding tests × Reset all
Scan summary Quality grade B+ (76/100). Dimensions: security 95, maintainability 55. 115 findings (7 security). 12,558 lines analyzed.

Showing 64 of 82 actionable findings. 164 raw detector signals were grouped into reader-sized issues. Click TP / FP to vote on a finding's accuracy — votes adjust the confidence weighting and improve detection across the platform.

low Security checks quality Quality conf 1.00 ✓ Repobility [MINED006] Overcatch Baseexception: except BaseException: ... — prevents Ctrl+C and SystemExit from working.
Review and fix per the pattern semantics. See CWE-705 / for context.
packages/markitdown/src/markitdown/converters/_rss_converter.py:68
high Security checks quality Quality conf 1.00 ✓ Repobility 20 occurrences `self._extract_and_ocr_images` used but never assigned in __init__
Method `convert` of class `DocxConverterWithOCR` reads `self._extract_and_ocr_images`, but no assignment to it exists in __init__ (and no class-level fallback). This raises AttributeError the first time the method runs against an instance.
5 files, 20 locations
packages/markitdown/src/markitdown/_markitdown.py:182, 185, 188, 191, 192, 193, 194, 195 (8 hits)
packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py:91, 141, 142, 146 (4 hits)
packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py:82, 86, 139, 191 (4 hits)
packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py:88, 99 (2 hits)
packages/markitdown-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py:193, 309 (2 hits)
high Security checks software dependencies conf 0.90 ✓ Repobility Dockerfile FROM `python:3.13-slim-bullseye` not pinned by digest
`FROM python:3.13-slim-bullseye` resolves the tag at build time. The registry CAN re-push a different image for the same tag, so every build is potentially different. Production images should pin to `image@sha256:...` for reproducibility + supply-chain integrity.
packages/markitdown-mcp/Dockerfile:1
high Security checks software dependencies conf 0.90 ✓ Repobility Dockerfile FROM `python:3.13-slim-bullseye` not pinned by digest
`FROM python:3.13-slim-bullseye` resolves the tag at build time. The registry CAN re-push a different image for the same tag, so every build is potentially different. Production images should pin to `image@sha256:...` for reproducibility + supply-chain integrity.
Dockerfile:1
low Security checks cicd CI/CD security conf 0.90 ✓ Repobility 5 occurrences GitHub Action is tag-pinned rather than SHA-pinned
Action `actions/checkout` pinned to mutable ref `@v5` uses a mutable tag or branch. Pin external actions to a reviewed full commit SHA when the workflow is security-sensitive.
2 files, 5 locations
.github/workflows/pre-commit.yml:8, 10 (3 hits)
.github/workflows/tests.yml:8, 9 (2 hits)
CI/CD securitySupply chainGitHub Actions
high Security checks software dependencies conf 0.90 ✓ Repobility pre-commit hook `https://github.com/psf/black` pinned to mutable rev `23.7.0`
`.pre-commit-config.yaml` references `https://github.com/psf/black` at `rev: 23.7.0`. If `{rev}` is a branch or version tag, the repo owner can push new code there and `pre-commit install --install-hooks` will fetch it on every developer's machine.
.pre-commit-config.yaml:2
low Security checks quality Error handling conf 1.00 3 occurrences [ERR001] Silent Exception Swallowing: Silently swallowing all exceptions hides bugs. Even in cleanup code, log at DEBUG level.
Log the error: `except Exception: logger.debug('cleanup failed', exc_info=True)`. Or handle specific exception types.
3 files, 3 locations
packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py:155
packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py:121
packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py:211
medium Security checks quality Quality conf 1.00 [SEC123] Production stack trace / debug output exposed: Debug mode left on in production exposes stack traces, environment variables, framework internals — sometimes triggers RCE (Django debug page with arbitrary template eval).
Set DEBUG=False / APP_DEBUG=false in production. Provide a generic 500 handler that logs to backend but returns a sanitized page to clients.
packages/markitdown-mcp/src/markitdown_mcp/__main__.py:129
low Security checks quality Quality conf 1.00 [SEC136] AI-typical over-broad exception handler swallowing all errors: Catch-all exception block that silently returns success or no-ops. AI agents reach for this pattern when a flaky test or an unfamiliar API throws — wrap, swallow, return success. Real bugs are masked, observability is destroyed, and callers think the operation worked. CWE-396 (improperly-generalized exception). Distinct from intentional fallback because there's no log line and the success value is fabricated.
Catch the specific exception type, log at error level with full exception info, and return a failure-shaped result. If the operation is genuinely best-effort, log at warning and document why in a comment so the next reader (or scanner) knows.
packages/markitdown/src/markitdown/converters/_llm_caption.py:22
low Security checks quality Quality conf 1.00 [SEC136] AI-typical over-broad exception handler swallowing all errors: Catch-all exception block that silently returns success or no-ops. AI agents reach for this pattern when a flaky test or an unfamiliar API throws — wrap, swallow, return success. Real bugs are masked, observability is destroyed, and callers think the operation worked. CWE-396 (improperly-generalized exception). Distinct from intentional fallback because there's no log line and the success value is fabricated.
Catch the specific exception type, log at error level with full exception info, and return a failure-shaped result. If the operation is genuinely best-effort, log at warning and document why in a comment so the next reader (or scanner) knows.
packages/markitdown/src/markitdown/converters/_image_converter.py:110
low Security checks quality Error handling conf 0.55 ✓ Repobility 24 occurrences Broad exception handler needs review
This handler catches Exception/BaseException. It is actionable when it swallows errors without logging, re-raising, or returning a structured error. Handlers that intentionally convert exceptions into typed error results should not be treated as high risk.
12 files, 23 locations
packages/markitdown-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py:120, 297, 302, 380, 386, 413, 419 (7 hits)
packages/markitdown/src/markitdown/_markitdown.py:79, 268, 630 (3 hits)
packages/markitdown/src/markitdown/converters/_youtube_converter.py:114, 176, 232 (3 hits)
packages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py:78, 107 (2 hits)
packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py:152
packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py:248
packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py:208
packages/markitdown/src/markitdown/converter_utils/docx/pre_process.py:150
Error handlingquality
medium Security checks cicd CI/CD security conf 0.76 Dockerfile copies broad context with incomplete .dockerignore
COPY . or ADD . is safer when .dockerignore excludes secrets, git history, keys, and generated artifacts.
packages/markitdown-mcp/Dockerfile:17 CI/CD securitycontainers
medium Security checks cicd CI/CD security conf 0.76 Dockerfile copies broad context with incomplete .dockerignore
COPY . or ADD . is safer when .dockerignore excludes secrets, git history, keys, and generated artifacts.
Dockerfile:22 CI/CD securitycontainers
medium Security checks quality Quality conf 1.00 ✓ Repobility Mutable default argument in `__init__` (list)
`def __init__(... = []/{}/set())` — Python's default value is constructed ONCE at function definition time and shared across all calls. Mutating it in one call mutates it for every future call too.
packages/markitdown/src/markitdown/converters/_doc_intel_converter.py:133
low Security checks cicd CI/CD security conf 0.72 .dockerignore misses sensitive defaults
.dockerignore exists but does not cover common secret or VCS patterns.
.dockerignore CI/CD securitycontainers
low Security checks cicd CI/CD security conf 0.74 Dockerfile leaves apt package indexes in the image layer
Package indexes increase image size and can expose stale metadata in the final image layer.
packages/markitdown-mcp/Dockerfile:10 CI/CD securitycontainers
low Security checks cicd CI/CD security conf 0.74 Dockerfile leaves apt package indexes in the image layer
Package indexes increase image size and can expose stale metadata in the final image layer.
Dockerfile:8 CI/CD securitycontainers
low Security checks quality Quality conf 0.60 27 occurrences Duplicated implementation block across source files
Duplicate implementation blocks are maintenance debt. Keep them visible, but they are not a high-severity defect unless the duplicated logic is security-sensitive or drifting.
12 files, 21 locations
packages/markitdown/src/markitdown/converters/_pdf_converter.py:342, 350, 354 (3 hits)
packages/markitdown/src/markitdown/converters/_pptx_converter.py:26, 30, 41 (3 hits)
packages/markitdown/src/markitdown/converters/_docx_converter.py:26, 37 (2 hits)
packages/markitdown/src/markitdown/converters/_epub_converter.py:22, 26 (2 hits)
packages/markitdown/src/markitdown/converters/_html_converter.py:4, 18 (2 hits)
packages/markitdown/src/markitdown/converters/_image_converter.py:15, 23 (2 hits)
packages/markitdown/src/markitdown/converters/_youtube_converter.py:41, 42 (2 hits)
packages/markitdown-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py:105
duplicationquality
low System graph hardware Coverage conf 1.00 Containers defined but no K8s/orchestration manifest found
Repo has Dockerfiles/compose but no Kubernetes/Nomad manifests. If the target deployment is K8s, the manifests may live in a separate ops repo.
Deployment
low System graph hardware Supply chain conf 1.00 Docker base image is tag-pinned but not digest-pinned: python:3.13-slim-bullseye
Container tags can be retagged upstream. Pin production base images to a reviewed digest (`image@sha256:...`) when reproducibility and supply-chain integrity matter.
Dockerfile:1 containersPinned dependencies
low System graph hardware Supply chain conf 1.00 Docker base image is tag-pinned but not digest-pinned: python:3.13-slim-bullseye
Container tags can be retagged upstream. Pin production base images to a reviewed digest (`image@sha256:...`) when reproducibility and supply-chain integrity matter.
packages/markitdown-mcp/Dockerfile:1 containersPinned dependencies
low System graph software Dead code candidate conf 1.00 File has no detected symbols: packages/markitdown-mcp/src/markitdown_mcp/__about__.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: packages/markitdown-ocr/src/markitdown_ocr/__about__.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: packages/markitdown-sample-plugin/src/markitdown_sample_plugin/__about__.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: packages/markitdown/src/markitdown/__about__.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph software Dead code candidate conf 1.00 File has no detected symbols: packages/markitdown/src/markitdown/converter_utils/docx/math/latex_dict.py
Source file with no class/function declarations — possible config, dead code, or scratch file.
low System graph security security conf 1.00 Insecure pattern 'debug_true' in packages/markitdown-mcp/src/markitdown_mcp/__main__.py:129
Found a known-risky pattern (debug_true). Review and replace if possible.
packages/markitdown-mcp/src/markitdown_mcp/__main__.py:129 Debug true
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 18 places
Functions with the same first-5-line body hash: packages/markitdown/src/markitdown/_base_converter.py:accepts, packages/markitdown/src/markitdown/converters/_audio_converter.py:accepts, packages/markitdown/src/markitdown/converters/_youtube_converter.py:accepts, packages/markitdown/src/markitdown/c…
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 19 places
Functions with the same first-5-line body hash: packages/markitdown/src/markitdown/_base_converter.py:convert, packages/markitdown/src/markitdown/converters/_audio_converter.py:convert, packages/markitdown/src/markitdown/converters/_youtube_converter.py:convert, packages/markitdown/src/markitdown/c…
duplicatesduplication
low System graph quality Integrity conf 1.00 5 occurrences Near-duplicate function bodies in 2 places
Functions with the same first-5-line body hash: packages/markitdown/src/markitdown/_base_converter.py:text_content, packages/markitdown/src/markitdown/_base_converter.py:text_content This is *the* AI-coder failure mode (4× more duplication in vibe-coded repos — see https://jw.hn/ai-code-hygiene). …
5 occurrences
repo-level (5 hits)
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 3 places
Functions with the same first-5-line body hash: packages/markitdown/src/markitdown/converters/_markdownify.py:convert_a, packages/markitdown/src/markitdown/converters/_markdownify.py:convert_img, packages/markitdown/src/markitdown/converters/_markdownify.py:convert_input This is *the* AI-coder fai…
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 7 places
Functions with the same first-5-line body hash: packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py:convert, packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py:convert, packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py:convert, packages/ma…
duplicatesduplication
low System graph quality Integrity conf 1.00 Near-duplicate function bodies in 8 places
Functions with the same first-5-line body hash: packages/markitdown-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py:accepts, packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py:accepts, packages/markitdown-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py:accepts, packages/ma…
duplicatesduplication
low System graph software Dead code conf 1.00 Possibly dead Python function: convert_a
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converters/_markdownify.py:39
low System graph software Dead code conf 1.00 Possibly dead Python function: convert_img
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converters/_markdownify.py:85
low System graph software Dead code conf 1.00 Possibly dead Python function: convert_input
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converters/_markdownify.py:112
low System graph software Dead code conf 1.00 Possibly dead Python function: convert_url
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/_markitdown.py:409
low System graph software Dead code conf 1.00 Possibly dead Python function: do_acc
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:200
low System graph software Dead code conf 1.00 Possibly dead Python function: do_bar
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:210
low System graph software Dead code conf 1.00 Possibly dead Python function: do_brk
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:149
low System graph software Dead code conf 1.00 Possibly dead Python function: do_common
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:153
low System graph software Dead code conf 1.00 Possibly dead Python function: do_d
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:219
low System graph software Dead code conf 1.00 Possibly dead Python function: do_eqarr
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:302
low System graph software Dead code conf 1.00 Possibly dead Python function: do_f
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:248
low System graph software Dead code conf 1.00 Possibly dead Python function: do_fname
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:265
low System graph software Dead code conf 1.00 Possibly dead Python function: do_func
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:257
low System graph software Dead code conf 1.00 Possibly dead Python function: do_groupchr
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:281
low System graph software Dead code conf 1.00 Possibly dead Python function: do_lim
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:330
low System graph software Dead code conf 1.00 Possibly dead Python function: do_limlow
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:312
low System graph software Dead code conf 1.00 Possibly dead Python function: do_limupp
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:323
low System graph software Dead code conf 1.00 Possibly dead Python function: do_m
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:336
low System graph software Dead code conf 1.00 Possibly dead Python function: do_mr
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:348
low System graph software Dead code conf 1.00 Possibly dead Python function: do_nary
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:356
low System graph software Dead code conf 1.00 Possibly dead Python function: do_r
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:369
low System graph software Dead code conf 1.00 Possibly dead Python function: do_rad
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:290
low System graph software Dead code conf 1.00 Possibly dead Python function: do_spre
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:234
low System graph software Dead code conf 1.00 Possibly dead Python function: do_sub
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:240
low System graph software Dead code conf 1.00 Possibly dead Python function: do_sup
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:244
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_sse
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown-mcp/src/markitdown_mcp/__main__.py:43
low System graph software Dead code conf 1.00 Possibly dead Python function: handle_streamable_http
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown-mcp/src/markitdown_mcp/__main__.py:55
low System graph software Dead code conf 1.00 Possibly dead Python function: load_string
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:49
low System graph software Dead code conf 1.00 Possibly dead Python function: register_page_converter
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown/src/markitdown/_markitdown.py:656
low System graph software Dead code conf 1.00 Possibly dead Python function: replace_img
No callers detected by AST scan in this repo. Could be exported for external callers or a framework handler.
packages/markitdown-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py:175
low System graph quality Integrity conf 1.00 Stub function `process_unknow` (body is just `pass`/`return`) — packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py:123
Likely an AI scaffold that was never filled in. Remove or implement.
Empty handlerDead code
For AI agents: Voting guide (TP/FP) MCP manifest Stdio wrapper SARIF Integrate Findings queue Vote TP/FP on findings to calibrate the engine.
For AI agents + API integrations
Email me when this repo regresses
Free. We re-scan periodically; new criticals → your inbox. No signup required for the scan itself.
API access

This page is publicly accessible at: https://repobility.com/scan/1d43486e-77ff-4a8a-83a6-48892d32677a/

To check status programmatically (no auth required):

curl -s https://repobility.com/api/v1/public/scan/1d43486e-77ff-4a8a-83a6-48892d32677a/

Important — please don't re-submit the same URL repeatedly. The submission endpoint is idempotent: re-submitting the same git URL returns this same scan_token, not a new one. To re-scan this repo, sign up free and use the dashboard.