pydepgate

pydepgate is a zero-runtime-dependency static analyzer for Python supply-chain malware hiding in interpreter startup paths.

It scans wheels, source distributions, installed packages, and loose Python files for code that can execute before a user script meaningfully starts: .pth import lines, sitecustomize.py, usercustomize.py, package __init__.py, setup.py and it scans library files too.

It is built for hostile package artifacts, forensic repeatability, and CI use, and is designed to become a package intake control system.

Install

pip install pydepgate

Requires Python 3.11 or later. pydepgate uses only the Python standard library at runtime. No dependencies required.

First scan

# Scan a wheel
pydepgate scan some-package-1.0.0-py3-none-any.whl

# Scan an installed package by name
pydepgate scan litellm

# Scan a source distribution
pydepgate scan some-package-1.0.0.tar.gz

# Scan one loose file
pydepgate scan --single suspicious_module.py

Exit code 0 means clean at the active threshold. Exit code 2 means at least one HIGH or CRITICAL finding. See Exit Codes for the full contract.

Preserve evidence

pydepgate can store scan evidence locally so a finding does not disappear into terminal scrollback.

pydepgate scan package.whl --save-to-db
pydepgate db list-runs
pydepgate db explain --run-id <run-id>

The evidence database records scan runs, artifact identity and hashes, static findings, decoded payload trees, and CVE findings.

Use this when you need reproducible findings, incident notes, maintainer reports, or later correlation by package name, version, or artifact hash.

See CLI Reference: db.

Capture scan lifecycle events

pydepgate can write a JSONL event stream for each scan. The event log records scan authorization, engine creation, scan start and completion, decode start and completion, evidence writes, and final run completion.

pydepgate scan package.whl --event-log scan.events.jsonl

Use event logs when a scan is part of CI, package intake, incident notes, or local evidence capture. The event log is not the finding report. Use JSON, SARIF, or human output for findings, and event JSONL for lifecycle evidence.

See Guide: Event Logs and Event Log JSONL.

Check known vulnerable versions

Static analysis and vulnerability matching answer different questions.

pydepgate cvedb update
pydepgate cvescan package.whl
pydepgate cvescan package.whl --save-to-db

pydepgate scan asks: “Does this artifact contain suspicious startup behavior?”

pydepgate cvescan asks: “Is this package name and version known to be vulnerable or malicious in the OSV PyPI feed?”

Use both when you want broader artifact coverage.

A combined scan mode will also be available in the near future (dependent on Roadmap 0.6.0 items)

See CLI Reference: cvedb and CLI Reference: cvescan.

What pydepgate detects

pydepgate focuses on attack shapes that execute silently during package installation, interpreter startup, or import-time initialization.

It detects:

  • Encoded payloads in Python source: base64, hex, zlib, gzip, bzip2, and lzma chains
  • Decode-then-execute patterns such as encoded content passed to exec, eval, compile, or __import__
  • Dynamic execution and import patterns
  • Obfuscated string construction that resolves to execution primitives or dangerous stdlib calls
  • Suspicious stdlib use in startup contexts: subprocess, shell execution, network access, and native code loading
  • Code-density anomalies: high-entropy strings, homoglyphs, invisible Unicode, machine-generated identifiers, deeply nested AST shapes, and low-signal obfuscation patterns
  • Embedded PEM and DER certificate material
  • Multi-layer decoded payload trees through the recursive decode pipeline

The goal is to catch adversarial-shape code in places Python runs automatically.

Why startup vectors matter

Python intentionally runs certain files and hooks during installation, interpreter startup, or package import. Those features are useful. They are also attractive to attackers.

Important startup vectors include:

  • .pth files in site-packages
  • sitecustomize.py
  • usercustomize.py
  • package __init__.py
  • setup.py

The .pth vector is especially dangerous because import lines inside .pth files are executed by Python’s startup machinery. The vector is tracked in CPython issue #113659, and the broader attack class maps to MITRE ATT&CK T1546.018.

Safe payload decoding

pydepgate can inspect encoded payloads without executing them.

pydepgate scan \
  --peek \
  --decode-payload-depth 4 \
  --decode-iocs full \
  package.whl

The decode pipeline unwraps supported encodings and compression formats, classifies the terminal content, reconstructs the payload chain, and can write IOC sidecars or encrypted malware-research archives.

It never executes decoded content. Pickle data is detected but not deserialized.

See Guide: Decode Payloads.

CI, SARIF, and Docker

For CI:

pydepgate scan --ci --min-severity high dist/*.whl

For SARIF:

pydepgate scan --format sarif --sarif-srcroot "$PWD" dist/*.whl > pydepgate.sarif

For Docker:

docker run --rm \
  -v "$(pwd):/scan" \
  ghcr.io/nuclear-treestump/pydepgate:latest \
  scan --ci --min-severity high package.whl

The Docker image supports linux/amd64 and linux/arm64, runs as a non-root user, and is designed for local scans, CI pipelines, and package-intake workflows.

See Guide: CI Integration, Guide: SARIF Integration, and Guide: Docker Image.

Finding fingerprints

pydepgate v0.5.0 includes the Finding Fingerprint v1 specification.

A finding fingerprint is a deterministic, content-addressed identifier for what a specific pydepgate version found in a specific artifact. The goal is simple: a researcher should be able to report a finding today, and a maintainer should be able to independently reproduce it later from the artifact and tool version.

The v1 specification is available now. CLI validation support is planned for a later release.

See Finding Fingerprint v1.

Documentation

Section Contents
Getting Started First scan, reading output, using explain
CLI Reference Top-level command structure and global flags
CLI Reference: scan Static startup-vector scanning
CLI Reference: cvedb Build and inspect the local OSV PyPI database
CLI Reference: cvescan Match wheel identity against known CVE / malware records
CLI Reference: db Store, query, and explain scan evidence
CLI Reference: explain Signal and rule lookup
Exit Codes Public exit-code contract for CI
Output Formats Human, JSON, SARIF, and decoded-tree schemas
Event Log JSONL Event envelope schema and scan lifecycle telemetry
Guide: Event Logs Capturing and consuming scan lifecycle JSONL
Environment Variables All PYDEPGATE_* variables
Rules File pydepgate.gate format and precedence
Signals Reference Signal IDs, severity mapping, and detection rationale
Finding Fingerprint v1 Deterministic finding fingerprint specification
Guide: CI Integration GitHub Actions, GitLab CI, Docker, and pre-commit
Guide: Docker Image Container tags, digests, signatures, attestations, and reproducibility
Guide: Custom Rules Suppressing false positives and adjusting severity
Guide: Decode Payloads Recursive decode pipeline, IOC sidecars, encrypted archives
Guide: SARIF Integration GitHub Code Scanning ingestion

Current limitations

pydepgate exec and pydepgate preflight are planned runtime and environment-auditing commands. They are documented as roadmap surfaces but are not functional yet.

pydepgate cvescan currently supports wheel artifacts.

License

Apache 2.0. See LICENSE.