SARIF Integration
pydepgate emits SARIF 2.1.0 documents compatible with GitHub Code Scanning, Azure DevOps, and any SARIF consumer that follows the OASIS spec. This page covers generating SARIF output and ingesting it into GitHub Code Scanning.
Generating SARIF output
Pass --format sarif to emit a SARIF document on stdout:
pydepgate scan --format sarif some-package.whl > scan.sarif
The exit code is the same as for human and JSON output: 0 for clean, 1 for findings below HIGH/CRITICAL, 2 for at least one HIGH or CRITICAL finding, 3 for a tool error. This means the scan step will exit non-zero when findings are present, which you need to account for in your workflow. See the GitHub Actions workflow section below for the standard handling.
The SARIF document
Each scan produces a single SARIF 2.1.0 document containing:
- The full rules catalog under
tool.driver.rules, with rule descriptions, help text, and common evasions for each signal pydepgate knows about. - Per-finding results with severity mapped to SARIF levels: CRITICAL and HIGH map to
error, MEDIUM maps towarning, LOW and INFO map tonote. - GitHub-compatible
security-severitynumeric scores on each result for correct placement in the GitHub vulnerability severity scale. - 24-character partial fingerprints on each result for cross-run alert deduplication. Results for the same finding across different runs of the same artifact are recognized as the same alert rather than accumulating as new ones.
automationDetails.idof the formpydepgate/{artifact_kind}/for cross-run grouping. Deep scans (--deep) suffix_deepto the artifact kind so deep and non-deep runs group separately.
Content-blind emission
The SARIF document describes what was called, not what was passed. A finding for subprocess.Popen() says subprocess.Popen() called in the message text. It does not include the arguments, the command line, any URLs, or the payload bytes. This is by design.
SARIF documents flow into CI logs, code scanning UIs, and artifact downloads. Embedding payload content in the document would replicate the exact threat the analyzer is detecting. A defender can publish a pydepgate SARIF document publicly without re-leaking the underlying attack content.
codeFlows for decoded payloads
When --decode-payload-depth is enabled alongside --format sarif, findings reached through the decode pipeline are surfaced with codeFlows encoding the attack chain. Each threadFlow walks from the outer high-entropy literal through each decode layer to the innermost detection. Multi-layer payloads produce nested threadFlow encoding with nestingLevel reflecting decode depth.
In GitHub’s code scanning UI, this appears as “Show paths” on a finding. Each step in the chain is visible.
Source root configuration
For in-repo scans where GitHub Code Scanning needs to resolve paths relative to the repository root, set --sarif-srcroot:
pydepgate scan --format sarif --sarif-srcroot /path/to/repo some-package.whl > scan.sarif
This populates originalUriBaseIds.PROJECTROOT in the document and tags on-disk artifact locations with uriBaseId: "PROJECTROOT". Without this, GitHub Code Scanning may not be able to link findings to source lines in the repository viewer.
The environment variable equivalent is PYDEPGATE_SARIF_SRCROOT.
GitHub Actions
The standard workflow scans the built artifact, writes SARIF output, and uploads it to GitHub Code Scanning:
name: pydepgate SARIF scan
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
contents: read
security-events: write
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Build wheel
run: |
pip install build
python -m build --wheel
- name: Install pydepgate
run: pip install pydepgate
- name: Run pydepgate scan
run: |
pydepgate scan \
--format sarif \
--sarif-srcroot "$" \
dist/*.whl > pydepgate.sarif
continue-on-error: true
- name: Upload SARIF to GitHub Code Scanning
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: pydepgate.sarif
category: pydepgate
continue-on-error: true on the scan step is required because pydepgate exits non-zero when findings are present. Without it, the upload step never runs. GitHub Code Scanning ingests the document regardless of whether findings are present; the alerts appear in the Security tab of the repository.
The permissions block requires security-events: write for the upload action to function on private repositories. Public repositories do not require the explicit permission.
With decode pipeline
To include codeFlow encoding for multi-layer payloads:
- name: Run pydepgate scan
run: |
pydepgate scan \
--format sarif \
--sarif-srcroot "$" \
--peek \
--decode-payload-depth 4 \
dist/*.whl > pydepgate.sarif
continue-on-error: true
This produces all expected outputs (SARIF document with codeFlows) from a single decode pass.
Validation
pydepgate’s SARIF emission is validated on every pull request in CI using the Microsoft SARIF Multitool. The workflow runs three synthetic fixtures: a clean scan, a scan with findings, and a scan with multi-layer codeFlows. Validation hard-fails on any warning or error.
To validate a SARIF document locally, install the SARIF Multitool via the .NET SDK (8.0 or later) and run:
dotnet tool install -g Sarif.Multitool
sarif validate pydepgate.sarif
A local equivalent of the CI workflow is available at scripts/validate_sarif.sh in the repository.