Overview
ArgusWatcher is the main class you interact with. It instruments your graph, records execution data, runs detectors, and produces traces. One watcher per execution run.
Basic Usage
from argus import ArgusWatcher
# Minimal — all defaults
watcher = ArgusWatcher()
watcher.watch(graph)
app = graph.compile()
result = app.invoke(state)
watcher.finalize()
# Access results
trace = watcher.get_trace()
print(trace.summary)Parameters
All parameters are optional. Pass them to the ArgusWatcher() constructor to override config file and environment variable values.
Core
max_field_sizeintMaximum characters to capture per state field. Fields exceeding this are truncated with a marker.
Default: 50_000
strictboolWhen True, raises an exception if any detector fires during finalize(). Useful for CI/CD quality gates.
Default: False
investigatebool | "always"Run forensic root cause analysis when detections are found. Set to "always" to analyze every trace regardless.
Default: True
Security
redact_keyslist[str]List of state field names to redact in traces. Values are replaced with [REDACTED]. Supports glob patterns.
Default: None
validatorsdictCustom validation functions keyed by field name. Each function receives the field value and returns True/False.
Default: {}
# Redact sensitive fields
watcher = ArgusWatcher(
redact_keys=["api_key", "password", "*.secret"],
)
# Custom validators
watcher = ArgusWatcher(
validators={
"output": lambda v: len(str(v)) > 10,
"confidence": lambda v: 0 <= v <= 1,
}
)Replay & Eval
persist_stateboolSave full state at each step to enable replay. Disable to reduce storage usage when replay isn't needed.
Default: True
record_httpboolRecord HTTP requests made during execution. Enables replaying with mocked external calls.
Default: False
semantic_judgeboolEnable LLM-as-judge semantic evaluation. Adds latency and cost but catches subtle quality issues.
Default: False
judge_modelstrWhich LLM to use for semantic judging. Any OpenAI-compatible model string.
Default: "gpt-4o"
Cost warning
semantic_judge sends node outputs to an LLM for evaluation. This adds API cost and latency proportional to the number of nodes in your graph. Use it selectively in staging/CI rather than on every production run.Lifecycle
A Watcher goes through four phases:
- Created — constructor called, parameters loaded, storage initialized
- Watching —
watch()called, graph instrumented, ready for execution - Recording — pipeline is running, watcher is capturing node inputs/outputs/timing
- Finalized —
finalize()called, detectors run, forensics generated, trace stored
# Full lifecycle
watcher = ArgusWatcher(strict=True) # Created
watcher.watch(graph) # Watching
app = graph.compile()
result = app.invoke(state) # Recording (happens during invoke)
watcher.finalize() # Finalized
# After finalize, access everything
trace = watcher.get_trace()
detections = trace.detections
forensics = trace.forensicsDo not reuse
finalize(), create a new Watcher for the next execution. Calling watch() on a finalized Watcher raises WatcherStateError.