Rules File
The .gate file format is the on-disk representation of pydepgate’s rules engine. This page is the formal specification: discovery, parsing, schema, validation, and precedence. For task-oriented walkthroughs (suppressing a specific signal, lowering a severity, restricting to a path), see Custom Rules.
File discovery
When pydepgate scan runs, it walks four locations in order and loads the first match. Locations later in the order are reported as also_found but not loaded. The walk is implemented in discover_rules_files in src/pydepgate/rules/loader.py.
- The explicit path passed via
--rules-file PATH. - The path in the
PYDEPGATE_RULES_FILEenvironment variable. ./pydepgate.gate(the current working directory).<sys.prefix>/pydepgate.gate(the active virtual environment root).
A file must end with the suffix .gate to be loadable. Passing a path with a different extension raises a load error rather than silently failing.
Format detection
pydepgate auto-detects whether the file is JSON or TOML by attempting JSON first and falling back to TOML on parse failure. The file extension does not determine the format; the content does.
JSON files should declare their format and version explicitly with two top-level keys:
{
"_pydepgate_format": "json",
"_pydepgate_version": 1,
"rules": [ ... ]
}
Both declaration keys are optional. The loader emits a warning when either is missing but loads the file anyway. If _pydepgate_format is set to a value other than "json", or _pydepgate_version is set to a value other than 1, the load fails.
TOML files have no format-declaration keys. TOML’s comment-only top-level grammar cannot host them. Any valid TOML file is treated as version 1.
File structure
JSON
Rules live in a top-level rules array (plural):
{
"_pydepgate_format": "json",
"_pydepgate_version": 1,
"rules": [
{
"id": "example",
"signal_id": "DYN002",
"action": "set_severity",
"severity": "critical"
}
]
}
TOML
Rules use the array-of-tables syntax under the key rule (singular, not rules):
[[rule]]
id = "example"
signal_id = "DYN002"
action = "set_severity"
severity = "critical"
A common mistake is writing [[rules]] in a TOML file. The loader reads data.get("rule", []) for TOML format; a file with [[rules]] parses successfully but contributes zero rules to the engine.
Rule schema
Every rule is an object with these fields. The complete valid-field set is defined as _VALID_RULE_FIELDS in src/pydepgate/rules/loader.py.
Identity
| Field | Type | Required | Description |
|---|---|---|---|
id | string | recommended | A short stable identifier for the rule. The loader prefixes it with the source name in uppercase, so id = "MY_RULE" from a user file becomes USER_MY_RULE in output. If omitted, an auto-generated identifier is used. |
explain | string | no | Human-readable rationale for the rule, displayed by pydepgate explain <rule-id>. |
Match fields
A rule matches a signal when every non-null match field matches. A rule with no match fields is a catch-all that matches every signal (rare but sometimes useful).
| Field | Type | Description |
|---|---|---|
signal_id | string | Exact match against Signal.signal_id. For example, "DYN002", "ENC001", "DENS010". |
analyzer | string | Exact match against Signal.analyzer. Matches every signal from one analyzer. Values are the analyzer module names: encoding_abuse, dynamic_execution, string_ops, suspicious_stdlib, code_density. |
file_kind | string | Match against the triage decision’s FileKind. Allowed values listed below. |
scope | string | Match against Signal.scope. Allowed values listed below. |
path_glob | string | Match against the file’s internal path within the artifact using fnmatch glob syntax. For example, "tests/**" or "vendor/*.py". |
context_contains | object | Legacy match against entries in Signal.context. Each {key: value} is shimmed at construction time into an eq predicate. Prefer context_predicates for new rules. |
context_predicates | object | Structured predicates over Signal.context fields. See Context predicates below. |
Effect fields
Every rule must specify an action and may carry effect-specific fields.
| Field | Type | Required | Description |
|---|---|---|---|
action | string | yes | One of suppress, set_severity, set_description. |
severity | string | when action = "set_severity" | Target severity. Allowed values listed below. |
description | string | when action = "set_description" | Replacement description text for the finding. |
Allowed values
action
| Value | Effect |
|---|---|
suppress | The signal still fires but does not produce an active finding. The suppression is recorded in suppressed_findings for audit visibility. |
set_severity | The finding is produced at the severity given in the severity field. Overrides the mechanical confidence-to-severity fallback. |
set_description | The finding’s description text is replaced with the description field. Severity falls back to the mechanical confidence-to-severity mapping unless another matching rule sets it. |
severity
info, low, medium, high, critical.
file_kind
pth, setup_py, init_py, sitecustomize, usercustomize, library_py, entry_points.
The skip value also exists in the FileKind enum but represents files the triage layer decided not to analyze. A rule with file_kind = "skip" will never match any signal in practice.
scope
module, class_body, function, nested_function, unknown.
Lowercased. Case-insensitive on input but always reported lowercase in the loaded rule.
Context predicates
context_predicates matches against specific fields of the signal’s context dictionary. Each predicate is a one-key mapping where the key is an operator and the value is the comparand.
[[rule]]
id = "escalate-huge-base64"
signal_id = "DENS010"
action = "set_severity"
severity = "critical"
context_predicates = { length = { gte = 10240 } }
This matches DENS010 signals whose context.length is 10240 or greater.
Operators
The complete operator set is defined as VALID_OPERATORS in src/pydepgate/rules/base.py.
| Category | Operators | Value type |
|---|---|---|
| Numeric and comparable | eq, ne, gt, gte, lt, lte | int or float (any orderable, in practice) |
| String | eq, ne, contains, startswith, endswith | string |
| Collection | in, not_in | list, tuple, or set |
eq and ne work for any type. contains works on strings (substring match) and on lists or tuples (membership). in and not_in test whether the actual value appears in the predicate’s collection.
Predicate evaluation
Each predicate dict must contain exactly one operator key. Loaders reject:
- An empty predicate dict.
- A predicate dict with multiple operator keys.
- A non-dict predicate value.
- An unknown operator name (with a typo suggestion via
difflib).
To AND multiple conditions on the same field, write multiple rules. The engine has no OR primitive; OR is expressed by writing two rules with the same effect.
Type mismatches at evaluation time return False rather than raising. For example, { length = { gte = 1024 } } evaluated against a signal whose context.length is a string silently fails to match. This means a single mistyped predicate cannot crash a scan.
Predicates with multiple fields
Multiple predicates on different fields are AND-ed:
[[rule]]
id = "huge-base64-in-init"
signal_id = "DENS010"
file_kind = "init_py"
action = "set_severity"
severity = "critical"
[rule.context_predicates]
length = { gte = 10240 }
alphabet = { eq = "base64" }
This matches only when both predicates pass: length is at least 10240 and alphabet equals "base64".
context_contains shim
context_contains is the legacy match field, preserved for backward compatibility. At rule-construction time, each entry is converted to an eq predicate and merged into context_predicates. The original context_contains dict remains readable on the loaded Rule for inspection but is not consulted during matching; only context_predicates is.
When both fields are provided for the same key, the explicit predicate wins on key collision and non-colliding keys merge.
Precedence
When multiple rules match a signal, the winner is chosen by this three-level comparison:
- Source priority. User rules win over system rules win over defaults, regardless of specificity.
RuleSourcevalues:default = 0,system = 1,user = 2. - Specificity. Among rules of the same source, the rule with more match fields wins. Each non-null match field counts as one. Each
context_predicatesentry counts as one. - Load order. Among ties on source and specificity, the rule loaded first wins.
This contract means a user rule with the same shape as a default rule always wins. To let your rule lose to a more-specific default, write fewer match fields than the rule you want to override.
When the winning rule’s action is suppress, the engine still computes what the finding would have been if only default rules had been considered and records that as would_have_been on the suppression entry. This is how the suppression audit trail surfaces what was hidden and why.
When no rule matches a signal at all, the engine falls back to the mechanical confidence-to-severity mapping in src/pydepgate/engines/base.py. See Signals for the mapping table.
Validation
The loader is strict: any validation error rejects the entire file. A partial rule set cannot silently change scan behavior. Errors are accumulated across all rules in the file and reported together so a hand-authored file with several mistakes surfaces them all in one pass rather than one at a time.
Typo suggestions
Unknown field names and unknown enum values trigger a difflib-based similarity suggestion with a cutoff of 0.6:
warning: rule MY_RULE: unknown field 'singal_id'. Did you mean 'signal_id'?
The suggestion mechanism is implemented in _suggest_field and is applied to field names, action names, severity values, file kind values, scope values, and operator names.
Error categories
| Category | Example | Behavior |
|---|---|---|
| Unknown field | singal_id = "DYN002" | Suggestion offered, file rejected |
| Unknown action | action = "supresss" | Suggestion offered, file rejected |
| Unknown severity | severity = "criticla" | Suggestion offered, file rejected |
| Unknown file_kind | file_kind = "setyp_py" | Suggestion offered, file rejected |
| Unknown scope | scope = "modul" | Suggestion offered, file rejected |
| Missing required field | action = "set_severity" without severity | File rejected |
| Empty predicate dict | context_predicates = { length = {} } | File rejected |
| Multiple operators in predicate | context_predicates = { length = { gte = 1, lte = 2 } } | File rejected |
| Non-dict predicate value | context_predicates = { length = 1024 } | File rejected |
| Invalid encoding | not valid UTF-8 | File rejected |
| Neither valid JSON nor valid TOML | malformed syntax | File rejected with TOML parser error |
| Wrong file extension | path does not end in .gate | File rejected before parse |
Warnings (non-fatal)
| Category | Example | Behavior |
|---|---|---|
| JSON missing format declaration | no _pydepgate_format key | File loads, warning written to stderr |
| JSON missing version declaration | no _pydepgate_version key | File loads, warning written to stderr |
| Also-found file not loaded | second .gate found during discovery | File loads, note written to stderr |
Identity prefixing
The loader prefixes every rule’s user-provided id with the source’s uppercase name. The transformation is in _build_rule in the loader:
prefix = source.value.upper()
rule_id = f"{prefix}_{explicit_id}"
So a rule with id = "MY_RULE" loaded from a user file appears in output as USER_MY_RULE. A rule with id = "MY_RULE" loaded as a system rule would appear as SYSTEM_MY_RULE. Default rules use rule_ids assigned by the engine and follow the convention default_<signal>_<context> (for example, default_dyn002_in_setup_py).
When the id field is omitted entirely, the loader assigns an auto-generated identifier in its place.
Complete examples
Minimal suppression (TOML)
[[rule]]
id = "ignore_dens020"
signal_id = "DENS020"
action = "suppress"
Severity escalation with predicate (TOML)
[[rule]]
id = "block_large_high_entropy_strings"
signal_id = "DENS010"
action = "set_severity"
severity = "critical"
context_predicates = { length = { gte = 10240 } }
explain = "Anything with 10 KB+ of high-entropy content is shellcode-shaped."
Path-restricted suppression (TOML)
[[rule]]
id = "vendored_dens040_ok"
signal_id = "DENS040"
path_glob = "vendor/**"
action = "suppress"
Description override (TOML)
[[rule]]
id = "clarify_dyn001"
signal_id = "DYN001"
action = "set_description"
description = "exec() called with literal argument; review for intent."
Equivalent JSON
{
"_pydepgate_format": "json",
"_pydepgate_version": 1,
"rules": [
{
"id": "block_large_high_entropy_strings",
"signal_id": "DENS010",
"action": "set_severity",
"severity": "critical",
"context_predicates": {
"length": { "gte": 10240 }
},
"explain": "Anything with 10 KB+ of high-entropy content is shellcode-shaped."
}
]
}
Implementation references
| Component | Location |
|---|---|
| File discovery | discover_rules_files in src/pydepgate/rules/loader.py |
| Format detection | _detect_format_and_parse in src/pydepgate/rules/loader.py |
| Rule construction | _build_rule in src/pydepgate/rules/loader.py |
| Field validation | _validate_rule_dict in src/pydepgate/rules/loader.py |
| Predicate validation | _validate_context_predicates in src/pydepgate/rules/loader.py |
| Operator set | VALID_OPERATORS in src/pydepgate/rules/base.py |
| Predicate evaluation | ContextPredicate.evaluate in src/pydepgate/rules/base.py |
| Match logic | matches in src/pydepgate/rules/base.py |
| Precedence resolution | _select_winning_rule in src/pydepgate/rules/base.py |
| Confidence fallback | confidence_to_severity_v01 in src/pydepgate/engines/base.py |