Rules File

The .gate file format is the on-disk representation of pydepgate’s rules engine. This page is the formal specification: discovery, parsing, schema, validation, and precedence. For task-oriented walkthroughs (suppressing a specific signal, lowering a severity, restricting to a path), see Custom Rules.

File discovery

When pydepgate scan runs, it walks four locations in order and loads the first match. Locations later in the order are reported as also_found but not loaded. The walk is implemented in discover_rules_files in src/pydepgate/rules/loader.py.

The explicit path passed via --rules-file PATH.
The path in the PYDEPGATE_RULES_FILE environment variable.
./pydepgate.gate (the current working directory).
<sys.prefix>/pydepgate.gate (the active virtual environment root).

A file must end with the suffix .gate to be loadable. Passing a path with a different extension raises a load error rather than silently failing.

Format detection

pydepgate auto-detects whether the file is JSON or TOML by attempting JSON first and falling back to TOML on parse failure. The file extension does not determine the format; the content does.

JSON files should declare their format and version explicitly with two top-level keys:

{
  "_pydepgate_format": "json",
  "_pydepgate_version": 1,
  "rules": [ ... ]
}

Both declaration keys are optional. The loader emits a warning when either is missing but loads the file anyway. If _pydepgate_format is set to a value other than "json", or _pydepgate_version is set to a value other than 1, the load fails.

TOML files have no format-declaration keys. TOML’s comment-only top-level grammar cannot host them. Any valid TOML file is treated as version 1.

File structure

JSON

Rules live in a top-level rules array (plural):

{
  "_pydepgate_format": "json",
  "_pydepgate_version": 1,
  "rules": [
    {
      "id": "example",
      "signal_id": "DYN002",
      "action": "set_severity",
      "severity": "critical"
    }
  ]
}

TOML

Rules use the array-of-tables syntax under the key rule (singular, not rules):

[[rule]]
id = "example"
signal_id = "DYN002"
action = "set_severity"
severity = "critical"

A common mistake is writing [[rules]] in a TOML file. The loader reads data.get("rule", []) for TOML format; a file with [[rules]] parses successfully but contributes zero rules to the engine.

Rule schema

Every rule is an object with these fields. The complete valid-field set is defined as _VALID_RULE_FIELDS in src/pydepgate/rules/loader.py.

Identity

Field	Type	Required	Description
`id`	string	recommended	A short stable identifier for the rule. The loader prefixes it with the source name in uppercase, so `id = "MY_RULE"` from a user file becomes `USER_MY_RULE` in output. If omitted, an auto-generated identifier is used.
`explain`	string	no	Human-readable rationale for the rule, displayed by `pydepgate explain <rule-id>`.

Match fields

A rule matches a signal when every non-null match field matches. A rule with no match fields is a catch-all that matches every signal (rare but sometimes useful).

Field	Type	Description
`signal_id`	string	Exact match against `Signal.signal_id`. For example, `"DYN002"`, `"ENC001"`, `"DENS010"`.
`analyzer`	string	Exact match against `Signal.analyzer`. Matches every signal from one analyzer. Values are the analyzer module names: `encoding_abuse`, `dynamic_execution`, `string_ops`, `suspicious_stdlib`, `code_density`.
`file_kind`	string	Match against the triage decision’s `FileKind`. Allowed values listed below.
`scope`	string	Match against `Signal.scope`. Allowed values listed below.
`path_glob`	string	Match against the file’s internal path within the artifact using `fnmatch` glob syntax. For example, `"tests/*"` or `"vendor/.py"`.
`context_contains`	object	Legacy match against entries in `Signal.context`. Each `{key: value}` is shimmed at construction time into an `eq` predicate. Prefer `context_predicates` for new rules.
`context_predicates`	object	Structured predicates over `Signal.context` fields. See Context predicates below.

Effect fields

Every rule must specify an action and may carry effect-specific fields.

Field	Type	Required	Description
`action`	string	yes	One of `suppress`, `set_severity`, `set_description`.
`severity`	string	when `action = "set_severity"`	Target severity. Allowed values listed below.
`description`	string	when `action = "set_description"`	Replacement description text for the finding.

Allowed values

`action`

Value	Effect
`suppress`	The signal still fires but does not produce an active finding. The suppression is recorded in `suppressed_findings` for audit visibility.
`set_severity`	The finding is produced at the severity given in the `severity` field. Overrides the mechanical confidence-to-severity fallback.
`set_description`	The finding’s description text is replaced with the `description` field. Severity falls back to the mechanical confidence-to-severity mapping unless another matching rule sets it.

`severity`

info, low, medium, high, critical.

`file_kind`

pth, setup_py, init_py, sitecustomize, usercustomize, library_py, entry_points.

The skip value also exists in the FileKind enum but represents files the triage layer decided not to analyze. A rule with file_kind = "skip" will never match any signal in practice.

`scope`

module, class_body, function, nested_function, unknown.

Lowercased. Case-insensitive on input but always reported lowercase in the loaded rule.

Context predicates

context_predicates matches against specific fields of the signal’s context dictionary. Each predicate is a one-key mapping where the key is an operator and the value is the comparand.

[[rule]]
id = "escalate-huge-base64"
signal_id = "DENS010"
action = "set_severity"
severity = "critical"
context_predicates = { length = { gte = 10240 } }

This matches DENS010 signals whose context.length is 10240 or greater.

Operators

The complete operator set is defined as VALID_OPERATORS in src/pydepgate/rules/base.py.

Category	Operators	Value type
Numeric and comparable	`eq`, `ne`, `gt`, `gte`, `lt`, `lte`	int or float (any orderable, in practice)
String	`eq`, `ne`, `contains`, `startswith`, `endswith`	string
Collection	`in`, `not_in`	list, tuple, or set

eq and ne work for any type. contains works on strings (substring match) and on lists or tuples (membership). in and not_in test whether the actual value appears in the predicate’s collection.

Predicate evaluation

Each predicate dict must contain exactly one operator key. Loaders reject:

An empty predicate dict.
A predicate dict with multiple operator keys.
A non-dict predicate value.
An unknown operator name (with a typo suggestion via difflib).

To AND multiple conditions on the same field, write multiple rules. The engine has no OR primitive; OR is expressed by writing two rules with the same effect.

Type mismatches at evaluation time return False rather than raising. For example, { length = { gte = 1024 } } evaluated against a signal whose context.length is a string silently fails to match. This means a single mistyped predicate cannot crash a scan.

Predicates with multiple fields

Multiple predicates on different fields are AND-ed:

[[rule]]
id = "huge-base64-in-init"
signal_id = "DENS010"
file_kind = "init_py"
action = "set_severity"
severity = "critical"

[rule.context_predicates]
length = { gte = 10240 }
alphabet = { eq = "base64" }

This matches only when both predicates pass: length is at least 10240 and alphabet equals "base64".

`context_contains` shim

context_contains is the legacy match field, preserved for backward compatibility. At rule-construction time, each entry is converted to an eq predicate and merged into context_predicates. The original context_contains dict remains readable on the loaded Rule for inspection but is not consulted during matching; only context_predicates is.

When both fields are provided for the same key, the explicit predicate wins on key collision and non-colliding keys merge.

Precedence

When multiple rules match a signal, the winner is chosen by this three-level comparison:

Source priority. User rules win over system rules win over defaults, regardless of specificity. RuleSource values: default = 0, system = 1, user = 2.
Specificity. Among rules of the same source, the rule with more match fields wins. Each non-null match field counts as one. Each context_predicates entry counts as one.
Load order. Among ties on source and specificity, the rule loaded first wins.

This contract means a user rule with the same shape as a default rule always wins. To let your rule lose to a more-specific default, write fewer match fields than the rule you want to override.

When the winning rule’s action is suppress, the engine still computes what the finding would have been if only default rules had been considered and records that as would_have_been on the suppression entry. This is how the suppression audit trail surfaces what was hidden and why.

When no rule matches a signal at all, the engine falls back to the mechanical confidence-to-severity mapping in src/pydepgate/engines/base.py. See Signals for the mapping table.

Validation

The loader is strict: any validation error rejects the entire file. A partial rule set cannot silently change scan behavior. Errors are accumulated across all rules in the file and reported together so a hand-authored file with several mistakes surfaces them all in one pass rather than one at a time.

Typo suggestions

Unknown field names and unknown enum values trigger a difflib-based similarity suggestion with a cutoff of 0.6:

warning: rule MY_RULE: unknown field 'singal_id'. Did you mean 'signal_id'?

The suggestion mechanism is implemented in _suggest_field and is applied to field names, action names, severity values, file kind values, scope values, and operator names.

Error categories

Category	Example	Behavior
Unknown field	`singal_id = "DYN002"`	Suggestion offered, file rejected
Unknown action	`action = "supresss"`	Suggestion offered, file rejected
Unknown severity	`severity = "criticla"`	Suggestion offered, file rejected
Unknown file_kind	`file_kind = "setyp_py"`	Suggestion offered, file rejected
Unknown scope	`scope = "modul"`	Suggestion offered, file rejected
Missing required field	`action = "set_severity"` without `severity`	File rejected
Empty predicate dict	`context_predicates = { length = {} }`	File rejected
Multiple operators in predicate	`context_predicates = { length = { gte = 1, lte = 2 } }`	File rejected
Non-dict predicate value	`context_predicates = { length = 1024 }`	File rejected
Invalid encoding	not valid UTF-8	File rejected
Neither valid JSON nor valid TOML	malformed syntax	File rejected with TOML parser error
Wrong file extension	path does not end in `.gate`	File rejected before parse

Warnings (non-fatal)

Category	Example	Behavior
JSON missing format declaration	no `_pydepgate_format` key	File loads, warning written to stderr
JSON missing version declaration	no `_pydepgate_version` key	File loads, warning written to stderr
Also-found file not loaded	second `.gate` found during discovery	File loads, note written to stderr

Identity prefixing

The loader prefixes every rule’s user-provided id with the source’s uppercase name. The transformation is in _build_rule in the loader:

prefix = source.value.upper()
rule_id = f"{prefix}_{explicit_id}"

So a rule with id = "MY_RULE" loaded from a user file appears in output as USER_MY_RULE. A rule with id = "MY_RULE" loaded as a system rule would appear as SYSTEM_MY_RULE. Default rules use rule_ids assigned by the engine and follow the convention default_<signal>_<context> (for example, default_dyn002_in_setup_py).

When the id field is omitted entirely, the loader assigns an auto-generated identifier in its place.

Complete examples

Minimal suppression (TOML)

[[rule]]
id = "ignore_dens020"
signal_id = "DENS020"
action = "suppress"

Severity escalation with predicate (TOML)

[[rule]]
id = "block_large_high_entropy_strings"
signal_id = "DENS010"
action = "set_severity"
severity = "critical"
context_predicates = { length = { gte = 10240 } }
explain = "Anything with 10 KB+ of high-entropy content is shellcode-shaped."

Path-restricted suppression (TOML)

[[rule]]
id = "vendored_dens040_ok"
signal_id = "DENS040"
path_glob = "vendor/**"
action = "suppress"

Description override (TOML)

[[rule]]
id = "clarify_dyn001"
signal_id = "DYN001"
action = "set_description"
description = "exec() called with literal argument; review for intent."

Equivalent JSON

{
  "_pydepgate_format": "json",
  "_pydepgate_version": 1,
  "rules": [
    {
      "id": "block_large_high_entropy_strings",
      "signal_id": "DENS010",
      "action": "set_severity",
      "severity": "critical",
      "context_predicates": {
        "length": { "gte": 10240 }
      },
      "explain": "Anything with 10 KB+ of high-entropy content is shellcode-shaped."
    }
  ]
}

Implementation references

Component	Location
File discovery	`discover_rules_files` in `src/pydepgate/rules/loader.py`
Format detection	`_detect_format_and_parse` in `src/pydepgate/rules/loader.py`
Rule construction	`_build_rule` in `src/pydepgate/rules/loader.py`
Field validation	`_validate_rule_dict` in `src/pydepgate/rules/loader.py`
Predicate validation	`_validate_context_predicates` in `src/pydepgate/rules/loader.py`
Operator set	`VALID_OPERATORS` in `src/pydepgate/rules/base.py`
Predicate evaluation	`ContextPredicate.evaluate` in `src/pydepgate/rules/base.py`
Match logic	`matches` in `src/pydepgate/rules/base.py`
Precedence resolution	`_select_winning_rule` in `src/pydepgate/rules/base.py`
Confidence fallback	`confidence_to_severity_v01` in `src/pydepgate/engines/base.py`