rasbt/LLMs-from-scratch

Component	Sub-score	Weight	Contribution
`structure_score`	85.0	0.15	12.75
`security_score`	16.0	0.25	4.00
`testing_score`	97.0	0.20	19.40
`documentation_score`	65.0	0.15	9.75
`practices_score`	77.0	0.15	11.55
`code_quality`	28.0	0.10	2.80
Overall		1.00	60.2

critical Security checks security secrets conf 0.95 Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

Gitleaks detected a committed secret or credential pattern.

ch05/01_main-chapter-code/ch05.ipynb:305

critical Security checks software dependencies conf 0.88 torch: GHSA-53q9-r3pm-6pq6

PyTorch: `torch.load` with `weights_only=True` leads to remote code execution

requirements.txt

critical Security checks software dependencies conf 0.88 transformers: GHSA-3863-2447-669p

transformers has a Deserialization of Untrusted Data vulnerability

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

low Security checks quality Quality conf 1.00 ✓ Repobility [MINED006] Overcatch Baseexception: except BaseException: ... — prevents Ctrl+C and SystemExit from working.

Review and fix per the pattern semantics. See CWE-705 / for context.

ch05/05_bonus_hparam_tuning/hparam_search.py:206

low Security checks quality Quality conf 1.00 ✓ Repobility [MINED006] Overcatch Baseexception: except BaseException: ... — prevents Ctrl+C and SystemExit from working.

Review and fix per the pattern semantics. See CWE-705 / for context.

ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py:141

high Security checks security path traversal conf 0.80 3 occurrences [SEC013] Path Traversal — User Input in File Path: User-controlled input used in file path without sanitization. Allows reading arbitrary files.

Use os.path.realpath() and verify the path starts with your expected base directory. Use secure_filename() for uploads.

3 files, 3 locations

appendix-E/01_main-chapter-code/gpt_download.py:41

ch05/01_main-chapter-code/gpt_download.py:42

ch05/01_main-chapter-code/gpt_generate.py:56

high Security checks software Resource exhaustion conf 1.00 [SEC035] Unbounded Resource Allocation — DoS risk: Allocating resources (buffers, recursion stack, large ranges) based on user input without an upper bound. Attackers send `size=10000000` to exhaust memory, or trigger expensive computation. CWE-770/400. Examples: CVE-2023-44487 (HTTP/2 Rapid Reset), countless YAML/XML billion-laughs variants.

Cap user-controlled sizes BEFORE allocation: size = min(int(request.args.get('n', 100)), MAX_SIZE) Set framework-level limits: Flask: app.config['MAX_CONTENT_LENGTH'] = 10 * 1024 * 1024 FastAPI: use middleware to enforce request size Django: DATA_UPLOAD_MAX_MEMORY_SIZE in settings.py …

ch06/03_bonus_imdb-classification/download_prepare_dataset.py:46

high Security checks quality Quality conf 1.00 [SEC080] Python: tarfile.extractall without filter: tarfile.extract*() without filter='data' allows path-traversal (CVE-2007-4559, fixed via PEP 706 in 3.12). Ported from bandit B202 (Apache-2.0).

Add `filter='data'` (Python ≥ 3.12) or manually validate member paths against `os.path.abspath`.

ch06/03_bonus_imdb-classification/download_prepare_dataset.py:46

high Security checks quality Quality conf 1.00 ✓ Repobility 25 occurrences `self.cache_k` used but never assigned in __init__

Method `forward` of class `MultiHeadAttention` reads `self.cache_k`, but no assignment to it exists in __init__ (and no class-level fallback). This raises AttributeError the first time the method runs against an instance.

6 files, 25 locations

ch04/10_kv-sharing/gpt_with_kv_sharing.py:61, 62, 64, 65, 66, 69, 70, 121 (12 hits)

ch04/10_kv-sharing/gpt_with_kv_mha.py:58, 59, 61, 62, 63, 111 (9 hits)

ch04/03_kv-cache/gpt_with_kv_cache.py:57

ch06/01_main-chapter-code/previous_chapters.py:99

ch06/02_bonus_additional-experiments/previous_chapters.py:103

ch06/03_bonus_imdb-classification/previous_chapters.py:100

high Security checks software dependencies conf 0.88 chainlit: GHSA-2g59-m95p-pgfq

Chainlit contain a server-side request forgery (SSRF) vulnerability

ch05/06_user_interface/requirements-extra.txt

high Security checks software dependencies conf 0.90 ✓ Repobility Dockerfile `ADD https://astral.sh/uv/install.sh`

Dockerfile `ADD <url>` downloads a remote artifact into the image with no integrity check. If the host or DNS is compromised between layers — or if the URL serves a different file later — malicious content gets baked into the image.

setup/03_optional-docker-environment/.devcontainer/Dockerfile:11

high Security checks software dependencies conf 0.90 ✓ Repobility Dockerfile FROM `pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime` not pinned by digest

`FROM pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime` resolves the tag at build time. The registry CAN re-push a different image for the same tag, so every build is potentially different. Production images should pin to `image@sha256:...` for reproducibility + supply-chain integrity.

setup/03_optional-docker-environment/.devcontainer/Dockerfile:2

low Security checks cicd CI/CD security conf 0.90 ✓ Repobility 34 occurrences GitHub Action is tag-pinned rather than SHA-pinned

Action `actions/checkout` pinned to mutable ref `@v6` uses a mutable tag or branch. Pin external actions to a reviewed full commit SHA when the workflow is security-sensitive.

12 files, 34 locations

.github/workflows/basic-tests-windows-uv-pip.yml:27, 30 (4 hits)

.github/workflows/basic-tests-latest-python.yml:25, 28 (3 hits)

.github/workflows/basic-tests-linux-uv.yml:31, 34 (3 hits)

.github/workflows/basic-tests-macos-uv.yml:31, 34 (3 hits)

.github/workflows/basic-tests-old-pytorch.yml:29, 32 (3 hits)

.github/workflows/basic-tests-pip.yml:31, 34 (3 hits)

.github/workflows/basic-tests-pytorch-rc.yml:25, 28 (3 hits)

.github/workflows/check-links.yml:16, 19 (3 hits)

CI/CD securitySupply chainGitHub Actions

high Security checks software dependencies conf 0.88 jupyterlab: GHSA-44cc-43rp-5947

JupyterLab vulnerable to potential authentication and CSRF tokens leak

requirements.txt

high Security checks software dependencies conf 0.88 jupyterlab: GHSA-9q39-rmj3-p4r2

HTML injection in Jupyter Notebook and JupyterLab leading to DOM Clobbering

requirements.txt

high Security checks software dependencies conf 0.88 jupyterlab: GHSA-mqcg-5x36-vfcg

JupyterLab's command linker attributes in HTML enable one-click command execution from untrusted content

requirements.txt

high Security checks software dependencies conf 0.88 jupyterlab: GHSA-rch3-82jr-f9w9

Jupyter Notebook Vulnerable to Authentication Token Theft via CommandLinker XSS

requirements.txt

high Security checks software dependencies conf 0.88 jupyterlab: PYSEC-2026-164

JupyterLab is an extensible environment for interactive and reproducible computing, based on the Jupyter Notebook Architecture. From 4.0.0 to 4.5.6, the allow-list of extensions that can be installed from PyPI Extension Manager (allowed_extensions_uris) is not correctly enforced by JupyterLab. The …

requirements.txt

high Security checks software dependencies conf 0.88 scikit-learn: PYSEC-2024-110

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words…

ch06/03_bonus_imdb-classification/requirements-extra.txt

high Security checks software dependencies conf 0.88 sentencepiece: GHSA-38vq-g6vr-w8wf

Sentencepiece has a a heap overflow issue

ch05/07_gpt_to_llama/requirements-extra.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2024-259

In PyTorch <=2.4.1, the RemoteModule has Deserialization RCE. NOTE: this is disputed by multiple parties because this is intended behavior in PyTorch distributed computing.

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-191

A vulnerability, which was classified as problematic, has been found in PyTorch 2.6.0+cu124. Affected by this issue is the function torch.mkldnn_max_pool2d. The manipulation leads to denial of service. An attack has to be approached locally. The exploit has been disclosed to the public and may be u…

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-198

In PyTorch through 2.6.0, when eager is used, nn.PairwiseDistance(p=2) produces incorrect results.

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-203

An issue in the component torch.linalg.lu of pytorch v2.8.0 allows attackers to cause a Denial of Service (DoS) when performing a slice operation.

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-204

pytorch v2.8.0 was discovered to display unexpected behavior when the components torch.rot90 and torch.randn_like are used together.

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-205

A syntax error in the component proxy_tensor.py of pytorch v2.7.0 allows attackers to cause a Denial of Service (DoS).

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-206

pytorch v2.8.0 was discovered to contain an integer overflow in the component torch.nan_to_num-.long().

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-207

A Name Error occurs in pytorch v2.7.0 when a PyTorch model consists of torch.cummin and is compiled by Inductor, leading to a Denial of Service (DoS).

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-208

A buffer overflow occurs in pytorch v2.7.0 when a PyTorch model consists of torch.nn.Conv2d, torch.nn.functional.hardshrink, and torch.Tensor.view-torch.mv() and is compiled by Inductor, leading to a Denial of Service (DoS).

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2025-209

An issue in pytorch v2.7.0 can lead to a Denial of Service (DoS) when a PyTorch model consists of torch.Tensor.to_sparse() and torch.Tensor.to_dense() and is compiled by Inductor.

requirements.txt

high Security checks software dependencies conf 0.88 torch: PYSEC-2026-139

A vulnerability was identified in PyTorch 2.10.0. The affected element is an unknown function of the component pt2 Loading Handler. The manipulation leads to deserialization. The attack can only be performed from a local environment. The exploit is publicly available and might be used. The project …

requirements.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2023-301

Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2024-227

Hugging Face Transformers MobileViTV2 Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in tha…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2024-228

Hugging Face Transformers MaskFormer Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability i…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2024-229

Hugging Face Transformers Trax Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-211

Hugging Face Transformers Perceiver Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-212

Hugging Face Transformers Transformer-XL Model Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerabili…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-213

Hugging Face Transformers megatron_gpt2 Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in t…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-214

Hugging Face Transformers SEW convert_config Code Injection Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the target…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-215

Hugging Face Transformers SEW-D convert_config Code Injection Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the targ…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-216

Hugging Face Transformers HuBERT convert_config Code Injection Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the tar…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-217

Hugging Face Transformers X-CLIP Checkpoint Conversion Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vul…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-218

Hugging Face Transformers GLM4 Deserialization of Untrusted Data Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Hugging Face Transformers. User interaction is required to exploit this vulnerability in that the t…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high Security checks software dependencies conf 0.88 transformers: PYSEC-2025-40

A vulnerability in the `preprocess_string()` function of the `transformers.testing_utils` module in huggingface/transformers version v4.48.3 allows for a Regular Expression Denial of Service (ReDoS) attack. The regular expression used to process code blocks in docstrings contains nested quantifiers…

ch02/02_bonus_bytepair-encoder/requirements-extra.txt

high System graph security security conf 1.00 Insecure pattern 'eval_used' in appendix-A/01_main-chapter-code/DDP-script-torchrun.py:160