Overview
ARGUS doesn't just detect failures — it learns from them. When the semantic judge (LLM) identifies a new failure pattern during a run, it proposes a candidate signature. You review and approve it in the Approvals page, and the heuristic engine uses it for all future runs — without needing an LLM call.
This creates a feedback loop: expensive LLM analysis discovers patterns once, and cheap heuristic matching catches them forever after.
How It Works
- Discovery — the LLM investigator analyzes a failure and extracts a reusable pattern (regex, substring match, etc.)
- Proposal — the pattern is saved as a candidate in
.argus/candidates.jsonwith confidence, evidence, and reasoning - Review — you open
argus uiand go to the Approvals page. Each candidate shows the pattern, match strategy, severity, confidence score, and source evidence - Approval — approve as Private (local only) or Shared (synced to all ARGUS users via cloud)
- Detection — the approved pattern is loaded into the heuristic engine and matched against every future node output

Three-Tier Signature Registry
The heuristic engine loads signatures from three sources, merged and deduplicated at startup:

- ‣Bundled — ships with ARGUS. Core patterns for placeholder outputs, semantic degradation markers, corrupted JSON, and repeated filler text. Stored in
data/signatures.jsoninside the package. - ‣Private— patterns you approved as "Private". Stored locally in
.argus/custom_signatures.json. Only your instance uses these. - ‣Shared — community-contributed patterns synced from the cloud. Stored in
.argus/shared_signatures_cache.json. When you approve a pattern as "Shared", it gets pushed to the cloud database and becomes available to every ARGUS user.
Deduplication
(pattern, match_strategy) — the most specific version wins. No pattern runs twice.Semantic Judge Override
The heuristic engine is fast but context-blind — it matches patterns without understanding meaning. A cookie-baking agent that outputs "I cannot find the flour" would trigger the "I cannot" refusal pattern, even though it's a legitimate response.
When semantic_judge=True, the LLM judge runs after heuristic detection and can override false positives. If a node failed only due to heuristic signals (no structural issues, no validator failures, no tool errors), the judge reviews the full input/output context and can clear the flag.
# The judge overrides heuristic false positives automatically
watcher = ArgusWatcher(semantic_judge=True)
# Detection pipeline:
# 1. Tool failure scan
# 2. Structural inspection (missing fields, type mismatches)
# 3. Heuristic signature scan (pattern matching)
# 4. Behavioral anomaly detection
# 5. Semantic judge (LLM) — can override step 3 if it's a false positiveApprovals UI
The Approvals page in argus ui has three tabs:
- ‣Pending — candidates discovered by the LLM investigator, awaiting your review. Each card shows pattern, strategy, severity, confidence, evidence, and source runs.
- ‣Private — your locally approved patterns. You can remove patterns from here if they turn out to cause false positives.
- ‣Shared— community patterns synced from the cloud. Click "Sync" to pull the latest shared signatures.
# Open the UI and navigate to Approvals
argus ui
# Cloud sync requires login
argus loginHuman-in-the-loop
