The Python Ecosystem’s Binary Security Gap: A 373-Binary Audit

Aljefra Security Research
Published: 2026-04-12
Authors: Aljefra Security Research Team

TL;DR

We audited 373 ELF binaries across 46 popular Python packages and found that 93% lack stack canaries, 98% lack full RELRO, and 94% lack FORTIFY_SOURCE – hardening flags that every Linux distribution has mandated for system packages since 2012.
A typical Python process loads 65,636 ROP gadgets from these unhardened extensions, giving attackers a Turing-complete instruction set for exploitation after any single memory corruption bug.
Rust-based extensions (cryptography, pydantic_core, bcrypt) score 100% on RELRO and BIND_NOW because the Rust compiler enables them by default. C extensions score near zero.
The fix is two environment variables. We are proposing a PR to pip/setuptools to make these hardening flags the default for all C extension builds.

Introduction

Every Python developer who has run pip install numpy or pip install Pillow has loaded native C code into their Python process. These C extensions – shared objects (.so files on Linux) – are where Python’s performance comes from. NumPy’s linear algebra, Pillow’s image decoding, lxml’s XML parsing, gRPC’s network handling: all implemented in C or C++, compiled into shared libraries, and loaded into the same address space as your Python code.

But how well are these binaries protected against memory corruption attacks?

We set out to answer that question systematically. Over the past several weeks, we built an ELF-aware binary analyzer, pointed it at the shared libraries installed by 46 of PyPI’s most popular packages, and measured their hardening posture against the same standards that Debian, Fedora, and Ubuntu apply to their system packages.

The results were sobering. The Python ecosystem’s C extensions are, on average, a decade behind Linux distributions in binary security hardening. The gap is not caused by exotic or expensive protections. It exists because pip and setuptools simply do not pass the standard hardening flags to the compiler when building C extensions from source.

Methodology

Toolchain

Our analysis pipeline consists of:

ELF Header Analysis (readelf -l, readelf -d, readelf -s): Extracts RELRO status, stack canary presence (via __stack_chk_fail symbol), FORTIFY_SOURCE usage (via *_chk function variants), RPATH/RUNPATH values, BIND_NOW flag, and symbol visibility.
ROP Gadget Enumeration (ROPgadget --binary): Counts available Return-Oriented Programming gadgets per binary. A gadget is a short instruction sequence ending in ret that an attacker can chain together to achieve arbitrary computation.
Unsafe Function Detection (nm -D + pattern matching): Identifies imports of known-dangerous C functions (strcpy, sprintf, strcat, gets, strtok, etc.) that have safer alternatives.
Co-occurrence Analysis (130K GitHub repository dataset): Maps which Python packages are commonly installed together, allowing us to compute the aggregate attack surface of real-world Python environments.
v2 Deep Analyzer: Our upgraded analysis engine that performs per-binary scoring across 8 dimensions: hardening flags, ROP gadget density, GOT entry count, unsafe function imports, RPATH safety, symbol visibility, binding mode, and cross-binary composition risk.

Scope

We analyzed 373 shared objects from 46 packages including: numpy, scipy, pandas, cryptography, Pillow (PIL), lxml, grpc, aiohttp, psycopg2, pydantic_core, bcrypt, pyarrow, greenlet, capstone, h5py, zstandard, MySQLdb, charset_normalizer, psutil, and others. We also included system libraries (glibc, libcrypto, libssl, libm, libdb) as a baseline.

Finding 1: The Hardening Gap

The central finding is a massive, systematic gap between the hardening posture of system libraries (compiled by Linux distributions with mandatory security flags) and Python C extensions (compiled by pip/setuptools with no security flags).

What These Protections Do

Stack Canaries (-fstack-protector-strong): The compiler places a random value (the “canary”) on the stack between local variables and the return address. Before a function returns, it checks whether the canary was overwritten. If it was – indicating a buffer overflow – the program aborts immediately instead of executing attacker-controlled code. Without canaries, a buffer overflow silently overwrites the return address and the attacker gains control when the function returns.

Full RELRO (-Wl,-z,relro,-z,now): RELRO (Relocation Read-Only) controls whether the Global Offset Table (GOT) – a table of function pointers that the dynamic linker fills in at runtime – is made read-only after initialization. “Partial RELRO” leaves the GOT writable for the life of the process, meaning an attacker who achieves a write primitive can overwrite any function pointer in the GOT to redirect execution. “Full RELRO” resolves all symbols at startup and then marks the GOT read-only, closing this attack vector.

FORTIFY_SOURCE (-D_FORTIFY_SOURCE=2): At compile time, the compiler replaces calls to dangerous functions like strcpy, sprintf, and strcat with bounds-checked versions (__strcpy_chk, __sprintf_chk, etc.) whenever it can determine the destination buffer size. This catches many buffer overflows at runtime that the programmer missed.

The Numbers

Hardening Property	System Libraries	Python C Extensions	Delta
Stack canaries	83%	7.0% (26/373)	-76 pts
Full RELRO	83%	2.4% (9/373)	-81 pts
FORTIFY_SOURCE	83%	5.6% (21/373)	-77 pts

To put this in perspective: 347 out of 373 binaries have no stack canary. 364 out of 373 have only partial RELRO (writable GOT). 352 out of 373 have no FORTIFY_SOURCE.

The 9 binaries with full RELRO are: the Rust-based extensions (cryptography, pydantic_core, bcrypt) and the system libraries (libc, libcrypto, libm). Every pure-C Python extension in our dataset has only partial RELRO.

Finding 2: 65,636 ROP Gadgets

What is ROP?

Return-Oriented Programming (ROP) is a technique that allows an attacker to execute arbitrary code without injecting any new code. Instead of writing shellcode into memory (which modern systems prevent with NX/DEP), the attacker chains together short sequences of existing instructions – called “gadgets” – that each end with a ret instruction. By carefully crafting a sequence of return addresses on the stack, the attacker makes the program “return” through a series of gadgets that collectively perform any desired computation.

A single useful gadget might be something like pop rdi; ret (load a value into a register and return). Chain enough gadgets together and you can make system calls, spawn shells, read files, or exfiltrate data.

The Scale

Across the 373 binaries in our dataset, we found 65,636 unique ROP gadgets. This is a Turing-complete instruction set available to any attacker who achieves a stack buffer overflow in any loaded binary.

Top 10 Riskiest Binaries by ROP Gadget Count

Binary	ROP Gadgets	GOT Entries	Canary	RELRO	Unsafe Functions
grpc/cygrpc.cpython-312	3,418	121	Yes	Partial	sscanf, strcpy
numpy/_multiarray_umath.cpython-312	3,215	245	No	Partial	strcpy, strtok, fscanf
scipy.libs/libscipy_openblas	3,058	139	No	Partial	sprintf, strcat
numpy.libs/libscipy_openblas64_	3,033	6	No	Partial	strcat, sprintf
cryptography/_rust.abi3	2,147	184	No	Full	strcpy, strcat, dlopen
scipy/_core.cpython-312	2,024	89	No	Partial	sscanf
libcrypto	1,884	8	Yes	Full	strcat, dlopen, strcpy
lxml/etree.cpython-312	1,558	211	No	Partial	sprintf, sscanf, strcat
pydantic_core/_pydantic_core.cpython-312	1,327	210	No	Full	realpath
libc	1,136	61	No	Full	strcpy, sscanf, gets

The distribution is heavily skewed: 4 binaries have 3,000+ gadgets each, contributing 12,724 gadgets (19% of the total). But even the “small” binaries matter – 236 binaries in the 1-99 gadget range collectively contribute 10,918 gadgets.

Why This Matters

ROP gadgets by themselves are not a vulnerability. But they are a prerequisite for exploitation of any memory corruption bug. The difference between a crash and remote code execution often comes down to whether the attacker has enough gadgets to build a useful ROP chain. With 65,636 gadgets available, the answer is always yes.

Finding 3: Unsafe C Functions

We identified 22 packages that import known-dangerous C functions – functions that the C standards community, CERT, and OWASP have recommended against for decades:

Function	Packages Using It	Risk
`strcpy`	22 packages (numpy, scipy, grpc, lxml, …)	CWE-120: Buffer overflow, no bounds checking
`sprintf`	14 packages (PIL, lxml, scipy, psycopg2, …)	CWE-134/CWE-120: Format string / buffer overflow
`strcat`	9 packages (numpy, lxml, cryptography, …)	CWE-120: Buffer overflow, no bounds checking
`dlopen`	8 packages (cryptography, PIL, psutil, …)	CWE-426: Untrusted search path
`sscanf`	7 packages (grpc, numpy, lxml, scipy, …)	CWE-120: Buffer overflow
`strtok`	7 packages (numpy, scipy, cryptography, …)	CWE-362: Not thread-safe

These functions are particularly dangerous in Python C extensions because:

No FORTIFY_SOURCE: Without -D_FORTIFY_SOURCE=2, the compiler does not replace these with bounds-checked variants. A strcpy into a stack buffer is a straight buffer overflow.
No stack canary: The overflow is not detected.
Partial RELRO: The attacker can overwrite GOT entries after the overflow.

The combination is what makes this an ecosystem-level risk rather than individual bugs. Each unsafe function call is a potential entry point, and the lack of hardening ensures that exploitation is straightforward if any one of them is reachable with attacker-controlled input.

Finding 4: RPATH Hijack Vectors

We found 42 libraries with RPATH or RUNPATH entries that contain deep directory traversal patterns. RPATH tells the dynamic linker where to search for shared library dependencies. When an RPATH contains relative paths or paths that an attacker can influence, it creates a library injection vector.

The Attack

A Python C extension has an RPATH like $ORIGIN/../../../lib
The attacker places a malicious shared library at the path the RPATH resolves to
When Python loads the extension, the dynamic linker finds the attacker’s library first
The attacker’s code executes with the privileges of the Python process

This is particularly dangerous in:

Virtual environments: where the directory structure is user-controlled
Container images: where layers may be combined from untrusted sources
Shared hosting: where multiple users share a filesystem

Our dataset includes 42 RPATH_HIJACK findings with risk scores up to 9.0/10.0.

Finding 5: Symbol Visibility Bloat

Most Python C extensions export far more symbols than they need to. The Python C API requires extensions to export exactly one symbol: PyInit_<modulename>. In practice, extensions export hundreds or thousands of internal symbols.

Notable examples from our dataset:

Binary	Exported Symbols	Required Symbols
scipy.libs/libscipy_openblas	11,330	~10
numpy.libs/libscipy_openblas64_	11,346	~10
libcrypto	5,363	~200 (public API)
libdb	1,805	~50 (public API)
lxml/etree.cpython-312	1,738	1
lxml/objectify.cpython-312	1,391	1

Total exported symbols across all 373 binaries: 45,474.

Each exported symbol is a potential target for symbol interposition attacks, where a malicious library loaded earlier in the search order overrides the symbol with attacker-controlled code. The default symbol visibility in GCC is default (exported). Extensions should use -fvisibility=hidden and explicitly export only the PyInit_ symbol.

92.4% of the 43,254 strong exports are never imported by any other library in the process — 39,980 symbols that serve no purpose but expand the attack surface.

Address Space Layout Randomization (ASLR) is the operating system’s primary defense against code-reuse attacks. It randomizes the base addresses of loaded libraries so that an attacker cannot predict where gadgets are located in memory.

However, all shared libraries loaded into a single Python process share one ASLR randomization event (the process startup). Once an attacker leaks a single address from any loaded library – through a format string bug, an info leak, or a side channel – they can calculate the base address of every other library in the process.

With 373 libraries loaded in a typical scientific Python environment, the attacker has 373 opportunities to find an info leak, and a single success reveals the layout of all 65,636 ROP gadgets.

28 bits of ASLR entropy protect a typical Python process, shared across all 373 loaded libraries. An info leak in any single library defeats ASLR for the entire process — and with 39,980 unnecessarily exported symbols and 12,020 GOT entries, the info-leak surface is enormous.

Finding 7: Lazy Binding and Writable GOT

364 out of 373 binaries use lazy binding. This means:

The GOT (Global Offset Table) remains writable for the entire lifetime of the process
Function addresses are resolved one-at-a-time on first call, not at startup
Each unresolved GOT entry initially points to the PLT (Procedure Linkage Table), which invokes the dynamic linker

An attacker who achieves a write primitive – through a buffer overflow, a use-after-free, or any other memory corruption – can overwrite any GOT entry to redirect the next call to that function to attacker-controlled code.

In our dataset, the 373 binaries collectively expose 12,020 GOT entries. Every single one is a potential hijack target in binaries without full RELRO.

The fix is adding -Wl,-z,now (BIND_NOW) to the linker flags. This forces eager symbol resolution at startup and, combined with -Wl,-z,relro, makes the GOT read-only. The one-time startup cost is typically under 1 millisecond for Python extensions (median GOT size: 17 entries).

We identified 72 LAZY_BINDING_RISK findings in our analysis, with risk scores up to 6.0/10.0.

Finding 8: Cross-Language Comparison

The most striking contrast in our data is between C extensions and Rust extensions.

Rust Extensions

Package	Binary	RELRO	BIND_NOW	Canary	Gadgets
cryptography	_rust.abi3	Full	Yes	No	2,147
pydantic_core	_pydantic_core.cpython-312	Full	Yes	No	1,327
bcrypt	_bcrypt.abi3	Full	Yes	No	363

Rust extensions achieve 100% on RELRO and BIND_NOW because the Rust compiler (rustc) enables full RELRO by default on Linux targets. This is a compiler-level decision, not something individual package maintainers opted into.

C Extensions (Representative Sample)

Package	Binary	RELRO	BIND_NOW	Canary	Gadgets
numpy	_multiarray_umath.cpython-312	Partial	No	No	3,215
scipy	_core.cpython-312	Partial	No	No	2,024
lxml	etree.cpython-312	Partial	No	No	1,558
grpc	cygrpc.cpython-312	Partial	No	Yes	3,418
Pillow	_imaging.cpython-312	Partial	No	No	~400
aiohttp	_http_parser.cpython-312	Partial	No	No	~80

C extensions score near 0% on RELRO and BIND_NOW because GCC does not enable these by default – they must be explicitly requested via compiler/linker flags. Since pip/setuptools don’t set these flags, the extensions ship unhardened.

The Lesson

The Rust compiler made a security-first default choice. The C compiler requires explicit opt-in. pip/setuptools should bridge this gap for the C ecosystem the same way Linux distributions do.

The Fix

The fix is remarkably simple. Two environment variables, applied at build time, would harden the entire Python C extension ecosystem:

export CFLAGS="-fstack-protector-strong -D_FORTIFY_SOURCE=2 -fstack-clash-protection"
export LDFLAGS="-Wl,-z,relro,-z,now"
pip install <any-package>

For individual users, this works today. Set these in your shell profile or CI/CD pipeline and every pip install from source will produce hardened binaries.

For the Ecosystem

We are drafting a PR to pypa/pip and pypa/setuptools to make these flags the default for all C extension builds. The proposal includes:

Default flag injection in pip’s build environment
An opt-out mechanism (--no-binary-hardening / PIP_NO_BINARY_HARDENING=1) for constrained platforms
Platform detection to apply the correct flags on Linux, skip on WASM/Windows, and use macOS equivalents where applicable

The full PR proposal document is available in our repository: pip_hardening_pr.md.

What About Pre-Built Wheels?

The flags only apply when building from source. Pre-built wheels (the default for most pip install operations) are already compiled. To harden wheels:

PyPI wheel builders (manylinux, cibuildwheel) should include these flags in their default build configuration
Package maintainers should add these flags to their setup.py / pyproject.toml / CI configurations
PEP proposal: We recommend a PEP to standardize build-time hardening flags across all Python build backends

Per-Package Scorecards

We generated security scorecards for each of the 46 packages in our analysis. Each scorecard includes:

Hardening score (0-100) based on canary, RELRO, FORTIFY presence
ROP gadget count and density
Unsafe function inventory
RPATH safety assessment
Symbol visibility analysis
GOT entry count and binding mode
Overall risk rating

Selected scores:

Package	Hardening Score	Binaries	Key Risk
MySQLdb	66.7	1	Moderate – has some hardening
bcrypt (Rust)	33.3	1	Full RELRO but missing canary/fortify
PIL/Pillow	0.0	8	8 unhardened binaries, uses sprintf/dlopen
numpy	0.0	~20	3,215 gadgets in core binary, strcpy usage
scipy	0.0	~30	3,058 gadgets in OpenBLAS, no hardening
aiohttp	0.0	4	4 unhardened binaries
lxml	0.0	~5	1,558 gadgets, sprintf/sscanf/strcat usage
capstone	0.0	1	504 gadgets, sprintf/strcpy usage

Full scorecard data is available in ecosystem_audit.json and composition_vulnerabilities.jsonl.

Methodology Details

v2 Analyzer Architecture

Our v2 analyzer (deep_audit.py) improves on the initial scan in several ways:

Per-binary scoring: Instead of aggregating at the package level, we score each .so individually and then compose the package score.
Cross-binary composition analysis: We compute the aggregate attack surface when multiple extensions are loaded together. A buffer overflow in library A can leverage ROP gadgets in library B and GOT entries in library C.
Co-occurrence weighting: Using our 130K GitHub repository dataset, we weight findings by how frequently packages are installed together. A vulnerability in numpy (installed in 47% of Python repos) is weighted differently than one in a niche package.
Automated patch generation: For each finding, the analyzer generates specific remediation commands:
Compiler flags to add to setup.py / pyproject.toml
readelf commands to verify the fix
Expected before/after values

Tools Used

readelf (GNU Binutils)     -- ELF header and section analysis
ROPgadget v6.x             -- ROP gadget enumeration
nm (GNU Binutils)          -- Symbol table analysis
objdump (GNU Binutils)     -- Disassembly verification
checksec.sh                -- Cross-reference for hardening flags
Python 3.12                -- Analysis pipeline

Limitations

Our analysis covers x86_64 Linux binaries only. ARM64 results may differ.
We analyzed libraries as installed by pip on Ubuntu 24.04. Other distributions may produce different results if packages ship distro-compiled wheels.
ROP gadget counts are an upper bound; not all gadgets are practically usable in every attack scenario.
FORTIFY_SOURCE detection relies on the presence of _chk function variants; some compilers may optimize differently.

Conclusion

The Python ecosystem has a binary security gap. It is not caused by negligent package maintainers or exotic threats. It is caused by a missing default: pip and setuptools do not pass standard hardening flags to the C compiler when building extensions from source.

The consequence is that 93% of Python C extensions lack stack canaries, 98% lack full RELRO, and 94% lack FORTIFY_SOURCE. These are protections that every Linux distribution has mandated for over a decade. The Rust compiler enables them by default. The Python build toolchain does not.

The fix is two lines of environment variables. We are proposing that pip and setuptools add these as defaults, with an opt-out mechanism for constrained platforms.

For immediate protection, add this to your CI/CD pipeline or shell profile:

export CFLAGS="-fstack-protector-strong -D_FORTIFY_SOURCE=2 -fstack-clash-protection"
export LDFLAGS="-Wl,-z,relro,-z,now"

Every pip install from source will then produce hardened binaries.

For the ecosystem, we urge the pip and setuptools maintainers to adopt these defaults. The performance cost is negligible, the backward compatibility risk is low, and the security benefit is immediate and measurable.

The data is clear. The fix is simple. The Python ecosystem deserves the same binary hardening that every Linux distribution has provided for its system packages since 2012.

Aljefra Security Research
Full dataset: 373 binaries, 46 packages, 65,636 ROP gadgets, 12,020 GOT entries, 45,474 exported symbols
Tools and data available at: https://repos.aljefra.com/admin/composition
Contact: [email protected]

The Python Ecosystem's Binary Security Gap: A 373-Binary Audit

The Python Ecosystem’s Binary Security Gap: A 373-Binary Audit

TL;DR

Introduction

Methodology

Toolchain

Scope

Finding 1: The Hardening Gap

What These Protections Do

The Numbers

Finding 2: 65,636 ROP Gadgets

What is ROP?

The Scale

Top 10 Riskiest Binaries by ROP Gadget Count

Why This Matters

Finding 3: Unsafe C Functions

Finding 4: RPATH Hijack Vectors

The Attack

Finding 5: Symbol Visibility Bloat

Finding 7: Lazy Binding and Writable GOT

Finding 8: Cross-Language Comparison

Rust Extensions

C Extensions (Representative Sample)

The Lesson

The Fix

For the Ecosystem

What About Pre-Built Wheels?

Per-Package Scorecards

Methodology Details

v2 Analyzer Architecture

Tools Used

Limitations

Conclusion

Share this research

Data Privacy Disclaimer

Explore the evidence

Related Research

What 51 GitHub Trending Repositories Reveal About Software Assurance

PR Proposal: Enable Binary Hardening Flags by Default in pip/setuptools

The Python Ecosystem's Binary Security Gap: A 373-Binary Audit

The Python Ecosystem’s Binary Security Gap: A 373-Binary Audit

TL;DR

Introduction

Methodology

Toolchain

Scope

Finding 1: The Hardening Gap

What These Protections Do

The Numbers

Finding 2: 65,636 ROP Gadgets

What is ROP?

The Scale

Top 10 Riskiest Binaries by ROP Gadget Count

Why This Matters

Finding 3: Unsafe C Functions

Finding 4: RPATH Hijack Vectors

The Attack

Finding 5: Symbol Visibility Bloat

Finding 6: ASLR Entropy Sharing

Finding 7: Lazy Binding and Writable GOT

Finding 8: Cross-Language Comparison

Rust Extensions

C Extensions (Representative Sample)

The Lesson

The Fix

For the Ecosystem

What About Pre-Built Wheels?

Per-Package Scorecards

Methodology Details

v2 Analyzer Architecture

Tools Used

Limitations

Conclusion

Share this research

Data Privacy Disclaimer

Explore the evidence

Related Research

What 51 GitHub Trending Repositories Reveal About Software Assurance

PR Proposal: Enable Binary Hardening Flags by Default in pip/setuptools