This document provides AI coding agents with everything needed to write consistent, idiomatic code for the EvidenceForge project.
EvidenceForge generates realistic synthetic security logs for cybersecurity threat hunting training and research. The system uses a two-phase hybrid architecture:
Phase 1 - Scenario Creation (Skill-assisted): Claude Code Skills guide users through scenario creation via structured interviews. Skills research TTPs via MITRE ATT&CK, expand high-level descriptions into detailed execution plans, and output structured YAML scenario files with companion research markdown.
Phase 2 - Log Generation (Deterministic): Generation engine executes the detailed scenario plan WITHOUT any LLM calls, producing large-scale, temporally consistent datasets across multiple log formats (Windows Event Logs, Zeek, Syslog, Snort/Suricata, web logs) with coordinated cross-references (matching LogonIDs, PIDs, session data).
This architecture combines LLM flexibility/realism with deterministic speed, cost-efficiency, and reproducibility.
Key Principle: The eforge CLI is a deterministic tool. Creative/interactive work happens through Claude Code Skills, not built-in LLM calls. Phase 2 is a deterministic renderer that executes the plan. Never call LLMs during generation. LLM integration is not built-in; scenario creation uses Claude Code Skills.
Storyline Events (Phase 8.4): Storyline entries use typed events lists, not free-text keyword matching. Each event has a type field (process, logon, connection, ssh_session, etc.) with per-type validated fields. The activity field is documentation only (for GROUND_TRUTH.md). See docs/reference/scenario-reference.md for the full event type reference.
Baseline Realism: The baseline engine includes: Hawkes self-exciting temporal model for bursty user activity (parameters derived from persona risk_profile), periodic+jitter timing for system/service traffic, day-of-week variation (Monday login storms, weekend near-zero), 26 legitimate lateral movement patterns (backup, monitoring, AD replication, app→DB, etc.), process→network correlation (browsers→HTTPS, DB clients→SQL, etc.), enriched stale account noise (Kerberos failures, lingering tasks, service startup failures), network-level red herrings (suspicious DNS, unusual outbound, scan overlaps), Linux syslog depth (18 categories including SSH login/key exchange, apt/dnf, systemd timers, logrotate, journald), diversified command pools with per-user parameterization, and entity lifecycle validation (boot time tracking, PID existence checks). Lateral movement patterns are conditional on environment topology — assign roles to systems to enable specific patterns.
Causal Expansion Engine: The CausalExpansionEngine (src/evidenceforge/generation/causal/) auto-generates prerequisite and consequent events via composable rules. DNS lookups before TCP connections, Kerberos/DC-bundle TGT/TGS evidence before domain logons, ProcessAccess after lsass injection, and supplementary audit events from command-line patterns are all handled automatically — scenario authors should NOT manually specify these as prerequisites. Authors CAN still specify these event types when they are part of the attack narrative itself (e.g., DNS tunneling, golden ticket forging). The validator warns on potentially redundant manual specifications. See docs/ARCHITECTURE.md § Causal Expansion Engine for implementation details.
CRITICAL: Read this section first before doing ANY work on this project.
This project uses TODO.md as the durable roadmap and backlog. It also uses
tracked worklogs under docs/worklog/ for multi-session effort memory. This is
designed to preserve agent continuity without forcing every branch to edit the
same high-conflict roadmap file.
-
START OF SESSION (BEFORE ANY WORK):
- ALWAYS read
TODO.mdfirst to understand:- What phase/milestone the project is in
- What durable backlog items remain
- Which worklogs contain active handoff context
- If
TODO.mdreferences a relevant worklog, read that worklog before changing code. - If
TODO.mddoesn't exist, create it with the initial durable roadmap based on the PRD.
- ALWAYS read
-
DURING NORMAL TASK WORK:
- Do not edit
TODO.mdjust to mark a task as started, in progress, or completed. - For one-shot fixes, use commits, PR descriptions, and final response notes as the task record.
- For multi-session efforts, assessment loops, investigations, or branch-local
progress that future agents need, create or update one focused worklog under
docs/worklog/. - Use one worklog per effort, named
YYYY-MM-DD-<effort-slug>.md.
- Do not edit
-
WHEN TO UPDATE
TODO.md:- Add, remove, or reprioritize durable backlog items.
- Record a milestone-level completion or major roadmap pivot.
- Reconcile roadmap state during release/integration work.
- Replace detailed progress history with a short pointer to a worklog or changelog entry.
-
WHEN ADDING NEW WORK:
- If it is durable product/project work, add it to
TODO.md. - If it is branch-local execution detail, loop history, command output, review notes, or handoff context, add it to a worklog instead.
- If it is durable product/project work, add it to
When a phase is fully complete, collapse its tasks in TODO.md to a 2-3 line
summary and move the detailed release history to CHANGELOG.md. Keep ongoing
handoff details in docs/worklog/ until they are no longer useful.
Core:
- Python 3.11+ (required for latest type hint features including
Self,TypedDictimprovements) - uv for package management, virtual environments, and script running
- Pydantic v2 for all data validation and schema management
CLI & Output:
- Typer for CLI framework (excellent Pydantic integration)
- Rich for progress bars, tables, and console formatting
- Jinja2 for log format templates
- PyYAML for configuration/scenario parsing
- pytz for timezone handling (UTC internal, configurable output)
Testing:
- pytest with pytest-cov, pytest-asyncio, pytest-mock, pytest-benchmark
- Default test runs should avoid coverage instrumentation: use
uv run pytest --no-covfor normal local and feature-PR validation. Coverage is a release/readiness gate beforedev→main, run explicitly withuv run pytest --cov=evidenceforge --cov-report=term-missing --cov-report=xml --cov-fail-under=70. - Separate test markers:
@pytest.mark.slowfor large dataset/workload tests (not run by default). Run slow tests with--no-covunless you are specifically profiling coverage behavior, because coverage instrumentation makes the generator workload much slower. - Do not combine
--include-slowwith coverage during release validation. Slow tests are intentionally run with--no-cov; coverage is measured on the default non-slow suite. - Enforced release coverage gate: 70% minimum. Aspirational target: 95%+ overall and 95%+ for the core generation engine.
Format Support:
- json-logic-qubit for format definition validation rules
- Standard library json/csv for text formats
- XML output via string templates (no python-evtx dependency)
Use uv for all dependency management (never pip). pyproject.toml is the source of truth.
- Type hints everywhere — all functions, methods, and variables must have type hints
- Pydantic for data — use Pydantic models for any structured data (configs, scenarios, API responses)
- Explicit over implicit — prefer clarity over cleverness
- Fail fast — validate inputs early, fail with clear error messages
- No magic — avoid metaclasses, dynamic imports, or other "clever" patterns unless absolutely necessary
- Conventional Commits — prefix every commit message with a type:
feat:,fix:,docs:,test:,refactor:,chore:. See CONTRIBUTING.md for details and examples.
The version is declared in three places that must always match:
pyproject.toml→version = "X.Y.Z"src/evidenceforge/__init__.py→__version__ = "X.Y.Z"uv.lock→ updated automatically byuv syncafter editingpyproject.toml
Bump rules (SemVer):
| Commit type(s) on branch | Bump |
|---|---|
| Any breaking change | MAJOR (x+1.0.0) |
Any non-breaking feat: commit |
MINOR (x.y+1.0) |
Only fix: / docs: / test: / refactor: / chore: |
PATCH (x.y.z+1) |
When to bump: Once per PR from dev to main, on the dev branch, as the last commit before opening that PR. Do not bump on feature branches or per-commit.
How: Before running gh pr create targeting main, inspect git log main..dev --oneline, determine the correct bump, update both version files, run uv sync to regenerate uv.lock, and commit all four (three version artifacts + CHANGELOG.md, see below) with:
chore: bump version to X.Y.Z
Changelog update (required with every version bump): as part of the same bump commit, prepend a new ## vX.Y.Z (YYYY-MM-DD) section to CHANGELOG.md summarizing every commit since the previous version entry. Drive the summary from git log main..dev --oneline (or git log vPREV..HEAD --oneline if tagged). Group related commits into themed subsections (e.g., "Explicit proxy path modeling", "TLS & X.509 realism", "CLI & config"), cite the short SHAs inline in parentheses, and skip pure merge commits and unrelated dependabot bumps. Version-bump-only releases (no code changes) still get an entry noting that. The changelog entry and the version bump land in the same commit.
Release automation: .github/workflows/release.yml enforces release hygiene
for every PR to main and every push to main. On PRs targeting main, it
verifies that pyproject.toml, src/evidenceforge/__init__.py, and uv.lock
all declare the same X.Y.Z version, then checks that remote tag vX.Y.Z does
not already exist. On pushes to main, it repeats those checks, creates an
annotated tag on the merged commit, pushes the tag, and creates the GitHub
Release entry so it appears under Releases. Remote tags are immutable release
history; never use git tag -f, git push --force, or delete/recreate a
vX.Y.Z tag to repair a missed bump. If the tag already exists, bump to the next
correct SemVer version before merging to main.
Manual release tag guard (fallback only): if the release workflow is
unavailable and a maintainer must validate a main PR by hand, run:
VERSION=$(python - <<'PY'
import ast
import re
import tomllib
from pathlib import Path
pyproject_version = tomllib.loads(
Path("pyproject.toml").read_text(encoding="utf-8")
)["project"]["version"]
init_tree = ast.parse(Path("src/evidenceforge/__init__.py").read_text(encoding="utf-8"))
init_version = None
for node in init_tree.body:
if not isinstance(node, ast.Assign):
continue
for target in node.targets:
if isinstance(target, ast.Name) and target.id == "__version__":
if isinstance(node.value, ast.Constant) and isinstance(node.value.value, str):
init_version = node.value.value
lock = tomllib.loads(Path("uv.lock").read_text(encoding="utf-8"))
lock_version = None
for package in lock.get("package", []):
if package.get("name") == "evidence-forge":
lock_version = package.get("version")
break
if not re.fullmatch(r"\d+\.\d+\.\d+", pyproject_version):
raise SystemExit(f"pyproject.toml version must be X.Y.Z, got {pyproject_version!r}")
if init_version != pyproject_version:
raise SystemExit(
"src/evidenceforge/__init__.py __version__ "
f"({init_version!r}) does not match pyproject.toml ({pyproject_version!r})"
)
if lock_version != pyproject_version:
raise SystemExit(
f"uv.lock evidence-forge version ({lock_version!r}) "
f"does not match pyproject.toml ({pyproject_version!r})"
)
print(pyproject_version)
PY
)
TAG="v${VERSION}"
git ls-remote --exit-code --tags origin "$TAG" && {
echo "Remote release tag $TAG already exists; bump the version before opening the main PR."
exit 1
}
Manual release tagging (fallback only): if the release workflow did not run
after a dev → main PR was merged, create an annotated tag on the merge commit
that landed on main, push it, and create the GitHub Release entry from that tag
so it appears under Releases.
git fetch origin main:refs/remotes/origin/main --tags
VERSION=$(python - <<'PY'
import ast
import re
import tomllib
from pathlib import Path
pyproject_version = tomllib.loads(
Path("pyproject.toml").read_text(encoding="utf-8")
)["project"]["version"]
init_tree = ast.parse(Path("src/evidenceforge/__init__.py").read_text(encoding="utf-8"))
init_version = None
for node in init_tree.body:
if not isinstance(node, ast.Assign):
continue
for target in node.targets:
if isinstance(target, ast.Name) and target.id == "__version__":
if isinstance(node.value, ast.Constant) and isinstance(node.value.value, str):
init_version = node.value.value
lock = tomllib.loads(Path("uv.lock").read_text(encoding="utf-8"))
lock_version = None
for package in lock.get("package", []):
if package.get("name") == "evidence-forge":
lock_version = package.get("version")
break
if not re.fullmatch(r"\d+\.\d+\.\d+", pyproject_version):
raise SystemExit(f"pyproject.toml version must be X.Y.Z, got {pyproject_version!r}")
if init_version != pyproject_version:
raise SystemExit(
"src/evidenceforge/__init__.py __version__ "
f"({init_version!r}) does not match pyproject.toml ({pyproject_version!r})"
)
if lock_version != pyproject_version:
raise SystemExit(
f"uv.lock evidence-forge version ({lock_version!r}) "
f"does not match pyproject.toml ({pyproject_version!r})"
)
print(pyproject_version)
PY
)
TAG="v${VERSION}"
MERGE_SHA=$(git rev-parse origin/main)
test -z "$(git tag --list "$TAG")" || {
echo "Local tag $TAG already exists; inspect it instead of overwriting it."
exit 1
}
git ls-remote --exit-code --tags origin "$TAG" && {
echo "Remote tag $TAG already exists; inspect it instead of overwriting it."
exit 1
}
git tag -a "$TAG" "$MERGE_SHA" -m "EvidenceForge $TAG"
git push origin "$TAG"
gh release create "$TAG" --repo Cisco-Talos/EvidenceForge --title "EvidenceForge $TAG" \
--verify-tag --fail-on-no-commits --generate-notes
The version on dev between releases will be ahead of main by one unreleased bump — this is expected and correct. Feature branches never touch the version.
- Before committing: always run
uv run ruff check .anduv run ruff format --check .and fix any errors. Apre-commithook enforces this, but verify manually when in doubt. - Ruff configuration is in
pyproject.toml— do not add# noqacomments without justification.
- Line length: 100 characters
- Indentation: 4 spaces
- Double quotes for strings (except to avoid escaping)
- Import order: stdlib, third-party, local (enforced by ruff's
Irules)
- Use modern Python 3.11+ built-in types:
list[User],dict[int, str]— nottyping.List,typing.Dict - Use
X | None— notOptional[X] - Always include return types on function signatures
- Annotate variables when the type isn't obvious from the assignment
- Google-style docstrings for public functions/classes. Test functions: name is the doc.
- Define specific custom exceptions inheriting from
EvidenceForgeErrorbase - Place exceptions in appropriate modules (
models/scenario.py,generation/engine.py,validation/schema.py) - Provide actionable error messages: say what's wrong and how to fix it
- Never catch bare
Exception— always use specific types
- Use
logging.getLogger(__name__)in every module - Use
%sformatting, not f-strings (lazy evaluation) - Console output:
warninganderroronly (configurable vialogging.console_level) - File output: all levels based on
logging.levelconfig (default:info); file location:{output_dir}/generation.log - Never log secrets, credentials, or full exception tracebacks to users
- Log retries at DEBUG, final failure at ERROR, progress milestones at INFO
- Use
Field()for descriptions and constraints - Use
field_validatorfor complex validation - Set
extra="forbid"to catch typos/unknown fields - Use
frozen=Truefor immutable configs - Provide clear error messages in validators
- Always use
pathlib.Path, never string paths. Resolve paths early at boundaries.
These rules prevent the recurring anti-patterns that make generated data look synthetic. Apply them to ALL generation code — engine, emitters, baseline, storyline.
Derived values (hostname, hash, domain) must be computed once and attached to the SecurityEvent. Emitters must never independently derive shared values via REVERSE_DNS, hashlib, or RNG. If two log formats need the same value (e.g., DNS query domain = SSL SNI = proxy hostname), it must come from a single source on the event object.
Anti-pattern: Each emitter does its own REVERSE_DNS.get(dst_ip) → produces inconsistent results.
Correct: Resolve hostname once in generate_connection(), pass to all downstream consumers.
When generated output is wrong, fix the root cause where that truth is owned. Do not patch symptoms in one emitter when the bad value, timing, routing, or correlation was created upstream. Prefer canonical context, state, planner, generation, routing/visibility, or data-config fixes over emitter patches when the issue involves shared event truth, actor/session/process ownership, timing, correlation IDs, source/destination semantics, or cross-source evidence. Do not force every fix into context objects: bad event ordering belongs in planner/generator timing, bad host visibility belongs in routing/visibility, bad enumerables belong in YAML config + loaders/validation, and bad source-native formatting belongs in emitters/templates.
Anti-pattern: A Windows emitter rewrites a bad LogonID or source IP just before rendering.
Correct: The planner/generator builds the right AuthContext/state relationship, and the emitter renders the source-native view of that already-correct event.
When adding a new correlated log source or expanding an existing one, target the canonical event/context/state layer first, then add the emitter as a source-native renderer. New integrations should reuse or extend SecurityEvent contexts, state relationships, causal/timing rules, visibility/routing rules, and data-driven config so the new source agrees with existing evidence by construction. Avoid private low-level emitter pipelines for evidence that must correlate with auth, process, network, file, registry, DNS, proxy, firewall, IDS, or other shared activity. Direct emitter-only generation is acceptable for purely source-local health/status noise, source-specific rendering details, or explicit raw escape hatches that do not need cross-source correlation.
Anti-pattern: A new EDR-like source independently invents process IDs, parent images, hashes, and connection tuples inside its emitter.
Correct: The source renders existing ProcessContext, NetworkContext, FileContext, state IDs, and causal relationships, adding new canonical context fields only when the source needs facts that no existing context owns.
Enumerable pools (domains, processes, user agents, file paths, OUI prefixes, TLS issuers) belong in YAML files under src/evidenceforge/config/ with cached loaders, not hardcoded Python lists. Follow the existing pattern:
src/evidenceforge/config/activity/spawn_rules.yaml+generation/activity/spawn_rules.pysrc/evidenceforge/config/activity/bash_commands.yaml+generation/activity/bash_commands.pysrc/evidenceforge/config/activity/tls_issuers.yaml+generation/activity/tls_issuers.pysrc/evidenceforge/config/activity/network_params.yaml
Anti-pattern: _ISSUERS = ["CN=R3, O=Let's Encrypt...", ...] hardcoded in generator.py.
Correct: Load from YAML at module level, cache after first call. Use from evidenceforge.config import get_activity_directory for paths.
Every function returning a fallback/default value must check os_category. Grep for hardcoded Windows paths (C:\\, explorer.exe, svchost.exe) outside explicitly Windows-only code blocks — each one is a potential Linux data corruption.
Anti-pattern: return r"C:\Windows\explorer.exe" without checking OS.
Correct: return "/usr/bin/bash" if os_category == "linux" else r"C:\Windows\explorer.exe"
Every event type that creates state must have a corresponding termination event with realistic timing:
logon→ pairedlogoff(immediate for type 3, end-of-day for type 2/10)process_create→process_terminatedhcp_lease(DISCOVER) → periodic renewals (REQUEST/ACK at T/2)connection→ duration + close
Anti-pattern: generate_machine_account_logon() creates type 3 session but never generates 4634 logoff.
Correct: Emit paired logoff 1-30 seconds after type 3 logon.
Any generation loop must use randomized counts, probabilistic skipping, and per-entity jitter. Fixed counts, fixed intervals, and uniform iteration are tells that reveal synthetic generation.
Anti-pattern: for slot_base in [0, 900, 1800, 2700]: → exactly 4 events/hour on every host.
Correct: num_tasks = rng.randint(2, 5); slots = sorted(rng.sample(range(0, 3600, 300), num_tasks))
Use _stable_seed(f"context_{scope_key}") for all deterministic derivation. Never use:
- Python's
hash()— non-deterministic across processes (PYTHONHASHSEED) - Globally-seeded
random.Random(42)— produces identical sequences for different entities
Anti-pattern: hash(f"mac_{system.ip}") for MAC generation; Random(42) for LogonIDs.
Correct: _stable_seed(f"mac_{system.ip}") for MAC; per-host RNG Random(_stable_seed(f"logon_ids_{hostname}")).
WorldModel / WorldPlanner (src/evidenceforge/generation/world_model.py) sit above the canonical event model and compile environment intent into operational decisions:
WorldModelresolves canonical host/user capabilities once from scenario fields such asuser.primary_system,system.assigned_user,system.roles, andsystem.servicesWorldPlannerowns session bootstrap semantics (interactive, network, SSH, RDP) and may allocate session state inStateManagerbeforeActivityGeneratoremits the correlated host/network evidence- Baseline and storyline code should call this layer for persona placement, remote admin source selection, and shared session bootstrap instead of re-implementing heuristics locally
Action bundles represent real-world activities that can produce multiple pieces of
evidence. They sit above SecurityEvent and are the preferred home for
cross-event lifecycle, timing, observation, and durable identity ownership.
A SecurityEvent represents one logical evidence-producing occurrence. It may
carry multiple contexts when those contexts describe facets of that same
occurrence and need to share truth. Contexts are not mini-event queues. Do not
cram a multi-phase activity into one SecurityEvent just because a context exists.
If an activity has distinct lifecycle phases (connection, auth accepted, session
opened, process created, command executed, session closed), model those as
distinct SecurityEvents coordinated by an action bundle.
Use action bundles for correlated behavior families across storyline, baseline, red herrings, and scanners. The source of intent may differ, but evidence construction should share the same bundle/lifecycle/timing path.
Contracts compose only through canonical semantic layers. Higher-level bundles may
request lower-level bundle contracts (for example SSH -> network connection, or
browser -> proxy/network/DNS/file transfer), but rendered source artifacts must
not become generation triggers. A Zeek conn.log row, eCAR FLOW row, Syslog
line, or Windows event may have validation/rendering contracts, but it must not
generate sibling evidence. Model those siblings as canonical events or contexts
owned by the appropriate bundle before rendering.
Source-observation profiles own source-level gaps and collection delay after
canonical events are built, but before rendering. Observation decisions must be
coherent for source-local lifecycle groups: process create/dependent/terminate
rows for one process, logon/logoff rows for one session, and network/protocol
companions for one UID/tuple should share the same source-local drop/delay
decision. Do not add emitter-local missingness or one-off timestamp delays that
can orphan a same-source lifecycle. Use observation profiles for coverage shape,
SourceTimingPlanner for source-native timestamp relationships, and bundles for
canonical lifecycle ownership.
For SSH specifically, modeled sessions from typed storyline events, baseline
remote-admin noise, and scp transfers to modeled Linux receivers should route
through the SSH action bundle. The SSH bundle owns SSH auth/session/PAM/logind
semantics, shell readiness, and session close intent; it delegates the TCP/22
transport occurrence to the canonical network-connection contract. Keep
transfer-specific receiver artifacts (for example, target-side file creation)
separate from the bundle only after the bundle has owned SSH auth/session timing
and the network contract has owned transport tuple, Zeek UID, packet accounting,
visibility, and transport close time. When source ports are allocated during
execution, use the resolved execution anchor rather than the pre-reservation
intent anchor for tuple-specific identity. Source-native SSH close evidence must
be lifecycle-compatible with the transport close, but should not be bit-identical
to Zeek close timing unless the source semantics require exact equality.
Compatibility paths that receive a Linux remote-interactive logon intent
(logon_type=10 with a remote source) must delegate to the SSH bundle as the
single owner. Do not emit a generic Linux logon event before or after the SSH
bundle for the same session; that creates duplicate endpoint USER_SESSION
evidence and breaks transport-before-auth ordering. In eCAR/EDR rendering, the
matching inbound SSH FLOW row is the transport observation and must render
before the bundle-owned SSH USER_SESSION/LOGIN row for the same source tuple.
If outbound client process visibility would delay the source-host SSH FLOW
past remote authentication, omit that FLOW's process identity rather than moving
the transport observation later. SSH syslog auth timing must account for the
canonical event's eCAR/EDR FLOW source-latency window, not only the network
sensor's final rendered connection timestamp.
For RDP specifically, modeled remote interactive Windows sessions should route
through the RDP action bundle. The bundle owns the source-side RDP client process
when a modeled source host is available, the TCP/3389 transport, target Type 10
logon/session metadata, source-port reuse, and temporal ordering between
source-visible transport evidence and target authentication evidence. It also
protects the source mstsc.exe process and its owning interactive session
through the transport close so endpoint process telemetry cannot end before the
visible RDP flow. Successful RDP logons must consume an SF transport interval
from the network-connection contract, not a failed, reset, or tiny placeholder
flow. Compatibility paths must not fabricate self-sourced Type 10 logons when no
real remote source exists; use the RDP bundle with a source, or downgrade the
event to local interactive semantics. Do not emit independent port 3389
connections or Type 10 logons for the same modeled RDP session outside the
bundle.
As with SSH, endpoint FLOW rows for RDP transport should stay near the
transport open; if late process visibility would invert transport-before-auth
ordering, drop PID/principal from the FLOW instead of delaying it.
For Windows remote administration specifically, explicit credential use and
remote service installation should route through the Windows remote-admin action
bundles. The bundles own 4648 subject/caller-process alignment, source endpoint
semantics, source-visible caller timing, service-control transport, dropped
service-binary evidence, and target service-install records. Keep tool-specific
authoring (runas, PsExec, wmic, schtasks) in scenario/storyline layers;
the generated evidence semantics belong in the bundle path.
For explicit forward proxy traffic, logical client-to-origin HTTP/HTTPS requests from hosts with explicit proxy routes should route through the proxy transaction action bundle. The bundle owns client-to-proxy evidence, proxy access semantics, tunnel reuse, cache/deny terminal behavior, proxy-origin DNS, proxy-to-origin egress, and the timing relationship between source-visible proxy requests and origin-side activity. Keep proxy format rendering in emitters and proxy route selection in planning/config layers.
For canonical network connections, route connection orchestration through the
network-connection action bundle. The bundle owns the boundary for source and
destination semantics, source-port allocation, DNS/TLS/HTTP/file/proxy/firewall
metadata, IDS/EDR flow correlation, source endpoint process ownership, Zeek UID
and connection state identity, visibility handoff, and source-native timing.
Higher-level bundles may request connections through the public generator
entrypoint, but they should not duplicate tuple allocation, hostname resolution,
packet accounting, or endpoint-flow ownership locally. Endpoint FLOW timestamps
must remain inside the canonical source-visible connection interval recorded in
state after network observation jitter is resolved; if a very short connection
cannot also satisfy source-visible process-create ordering, omit process actor
identity for that FLOW row instead of moving endpoint telemetry after the
transport close. Higher-level auth/session bundles such as SSH and RDP should
anchor successful authentication evidence to that interval, not to their original
intent timestamp. For inbound SSH, eCAR FLOW is transport evidence and must not
be delayed behind the USER_SESSION login solely to wait for destination-side
sshd process visibility; omit the listener PID/principal when that identity
would invert transport-before-session ordering.
For NTP specifically, ntp.log fan-out is a parser view of a response-bearing
UDP/123 connection. The network-connection contract must own the shared UID,
conn_state, history, byte/packet accounting, duration, and NtpContext.
Do not emit Zeek NTP rows for S0/no-response connections. Baseline NTP traffic
should follow a stable per-client/server poll schedule with source-observation
texture, not one exact hourly tick per host.
For DHCP leases, route acquisition and renewal transactions through the DHCP
lease action bundle. The bundle owns lease identity, MAC/IP/server/domain
metadata, Zeek DHCP plus connection fan-out, link-local visibility semantics,
and Linux dhclient syslog companion ordering. Do not hand-roll separate DHCP
syslog or Zeek rows for the same lease transaction outside the bundle.
For automatic DNS lookups, route prerequisite resolver evidence through the DNS
lookup action bundle. The bundle owns resolver selection, cache behavior,
query/answer semantics, TTL observations, Zeek DNS plus UDP/53 connection
fan-out, Sysmon DNS visibility, AD SRV discovery companions, and low-volume
resolver companion questions. Storyline dns_query, dga_queries, and
dns_tunnel events may still model DNS as the attack narrative, but connection
prerequisite DNS should not be duplicated at call sites.
For browser-like HTTP/S sessions, page loads and their subresources should route
through the browser-session action bundle. The bundle owns request grouping,
transaction depth, referrer chains, page/subresource timing, static-asset cache
suppression, response metadata, and direct-vs-explicit-proxy handoff through
canonical connection generation. Single tool requests, raw storyline HTTP
events, and source-local web server noise may remain direct canonical events
unless they model a browser session.
Browser/proxy HTTP planning owns source-native web semantics before render:
HTTPS-first domains should redirect rather than serve plaintext HTTP success
pages, non-browser service endpoints should keep compatible User-Agents, and
installer/download paths should carry binary MIME and body-size semantics.
The first request on a persistent plaintext HTTP flow must carry flow-level
transaction count and body-byte budgets so later same-flow requests can reuse the
Zeek UID and render trans_depth > 1; do not disable that accounting for
browser-style page/subresource sessions.
For scanner/probe activity, typed port_scan and web_scan storyline events,
scheduled scanner-overlap suspicious noise, and nmap process side effects should
route through scanner/probe action bundles. The bundles own bulk target/request
expansion, scanner timing, per-probe connection profiles, firewall/IDS/HTTP
contexts, source process attribution when present, and ground-truth summaries.
They may still request canonical network connections through the generator; do
not duplicate scanner target fan-out, IDS scanner selection, or nmap transport
side effects at individual call sites.
For IDS alerts, build alert context through the IDS alert action bundle when a
data-driven signature or preset rule is attached to canonical network evidence.
The bundle owns (gid, sid, rev) identity, message/classification/priority
normalization, and signature-owned DNS payload construction for DNS alerts.
Emitters should only render IdsContext; they should not choose signatures or
invent alert/DNS payloads.
For modeled file transfers, use the file-transfer action bundles for transfer
identity, Zeek files.log metadata, receiver endpoint file evidence, and
source-visible timing. HTTP response bodies, substantial SMB reads/writes,
staged-archive SMB reads, and SCP receiver-side file creation should share the
bundle helpers instead of independently inventing FUIDs, hashes, filenames,
transfer direction, or target process ownership. SSH/RDP/proxy bundles still own
their transport/session semantics; file-transfer bundles own the transfer/file
evidence layered on top. Large or download-scale HTTP responses with
source-native MIME/body metadata should attach the HTTP file-transfer bundle
deterministically, even when the HttpContext was supplied by a browser-session,
proxy, process-command, or storyline path.
For Linux shell command execution, route bash-history emission and correlated foreground process telemetry through the Linux shell-command action bundle. The bundle owns the command execution sequence: resolve activity keys to concrete commands, align commands after SSH/session readiness, schedule per-user bash-history timestamps, emit bash history, and then emit optional process telemetry through shared adapter hooks. Do not hand-roll separate bash-history and process timing paths for the same modeled command.
For process execution, route canonical process create/terminate lifecycle and
process-owned side effects through the process-execution action bundle. The
bundle owns the boundary between modeled execution intent and SecurityEvent
evidence: session/parent ownership, source-visible create/terminate timing,
command-owned network effects, guaranteed process-image file evidence, and
probabilistic file/module/registry endpoint side effects. Other bundles may call
the public process entrypoints, but they should not duplicate process lifecycle
or side-effect generation locally.
For authentication and session lifecycle, route successful logons, failed logons, logoffs, service logons, machine-account logons, anonymous logons, NTLM validation, and workstation lock/unlock evidence through the auth/session action bundles. The bundles own the boundary for session allocation/reuse, logon IDs, lock state, source endpoint semantics, transport/syslog companions, DC-side validation evidence, failure-network companions, and session termination ordering. Other bundles may request logon or logoff evidence, but they should not locally invent duplicate session IDs, source ports, auth-package fields, lock/unlock re-authentication, or validation companion traffic.
For Kerberos/DC evidence, route domain-logon TGT/TGS companions, visible KDC-flow audit repair, TGT requests, TGT renewals, service-ticket requests, and pre-authentication failures through the Kerberos/DC action bundles. The bundles own DC-side Kerberos source endpoint semantics, source-port reservation, TGT cache behavior, service-principal identity, source-native ticket timing, and optional companion KDC network evidence. Do not independently emit 4768, 4769, 4770, or 4771 rows at call sites or patch Kerberos source ports in emitters.
For Windows audit/account-management evidence, route log-cleared, scheduled-task, account-created/deleted/changed, password reset/change, group-membership, create-remote-thread, and process-access evidence through the Windows audit action bundles. The bundles own subject/session ownership, target identity, source timing, process/thread lifecycle validation, and shared Sysmon/eCAR context. Do not duplicate 1102, 4698, 472x/4738/475x, Sysmon Event 8, or Sysmon Event 10 construction at storyline/causal call sites or patch these fields in emitters.
When fixing realism defects:
- Cross-event ordering, lifecycle, source timing, observation, and durable identities belong in bundle/lifecycle/timing/observation layers.
- Use the temporal constraint graph for relationships that span multiple evidence timestamps or source observations. Local timestamp clamps are still acceptable at narrow boundaries, but new family-level timing ownership should express preferred times, hard bounds, lifecycle windows, and causal edges in the graph so dependent evidence cannot invert.
- Shared facts for one occurrence belong on
SecurityEventcontexts. - Emitters render source-native views; they do not invent shared facts or repair upstream lifecycle contradictions.
The generation engine uses a canonical event model — an intermediate representation between activity generation and log rendering. ActivityGenerator builds SecurityEvent objects carrying composable context dataclasses (HostContext, AuthContext, ProcessContext, NetworkContext, DnsContext, FileContext, RegistryContext, IdsContext, SyslogContext). An EventDispatcher routes each event to StateManager.apply() and to matching emitters based on can_handle() and network visibility.
Core principle: consistency by construction, not by coordination. Two emitters cannot disagree about a port number because there is only one port number — on the event object.
Two-phase build + dispatch: (1) Allocate IDs from StateManager in the responsible planning/generation layer (WorldPlanner for planner-owned sessions, ActivityGenerator for processes/connections and direct logons), (2) build a complete SecurityEvent with those IDs, (3) dispatch to emitters. StateManager.apply() records state from a fully-constructed event — it does NOT allocate IDs. RawLogEntry exists solely for the user-facing raw event type in scenario YAML. All internal engine code uses canonical SecurityEvent dispatch exclusively — including baseline IDS alerts, web access, syslog daemon messages, DHCP leases, anomaly records, anonymous logons, and sensor startup.
Full design details: docs/design/event-model-prd.md. Key types: src/evidenceforge/events/.
StateManager (src/evidenceforge/generation/state_manager.py) is the single source of truth for runtime state:
- Planning/generation layers write state —
WorldPlannermay allocate/register sessions;ActivityGeneratorallocates processes/connections and may emit against preallocated sessions - Emitters only read state — to get LogonIDs, PIDs for rendered events; never mutate StateManager
apply(event)records state from a fully-constructed SecurityEvent — handles teardown (logoff, process termination) and updates (connection bytes); does NOT allocate IDs- Events are transient (GC'd after dispatch); StateManager owns durable state
- Thread-safe for reads, single-threaded for writes
All emitters inherit from LogEmitter ABC (src/evidenceforge/generation/emitters/base.py):
- Each emitter declares
_supported_typesand implementscan_handle(event)for dispatcher self-selection emit()receivesSecurityEventobjects, builds a field dict via_render_{event_type}(), passes to Jinja2 templateemit_raw()is the escape hatch forRawLogEntry- Buffer writes (10K events), use atomic flush, always flush on close
- Handle timezone conversion (UTC → system/format timezone)
- OS-specific emitters check
event.host.os_categoryincan_handle() - Each emitter runs in separate thread, writes to separate file
Format definitions are YAML files in src/evidenceforge/config/formats/, not code. Each defines fields, variants, JSON Logic validators, and Jinja2 output templates. Loaded via formats/loader.py. Adding a new format requires only a new YAML file.
All YAML lookup/reference data lives in src/evidenceforge/config/ with subdirectories:
config/formats/— format definitions (field schemas, validators, templates)config/evaluation/— evaluation rules (causal pairs, co-occurrence, distributions)config/activity/— activity generation data (DNS registry, spawn rules, TLS issuers, etc.)config/personas/— pre-built persona definitions
Rule: When adding new YAML data files, place them in the appropriate config/ subdirectory. Never scatter YAML data files alongside Python code. Loader modules import path helpers from evidenceforge.config and handle caching/validation in their own domain.
Pattern for new data files:
- Add the YAML file to the appropriate
config/subdirectory - Create or update a loader in the relevant domain module
- Use
from evidenceforge.config import get_{category}_directoryfor path resolution - Follow the cached-loader pattern (module-level
_CACHED_DATA, load-on-first-call)
- Store all datetimes in UTC internally (
datetime.timezone.utc) - Convert to output timezone only when rendering logs
- Support per-system timezone overrides with pattern matching
- Default timezone from
environment.timezone.default
When adding or significantly modifying event types, emitters, or the event schema, update ALL of the following:
- Documentation —
docs/reference/EVIDENCE_FORMATS.md,docs/reference/scenario-reference.md,README.md - Architecture & design docs —
docs/ARCHITECTURE.md(emitter tree, event type list, causal expansion rules),docs/design/event-model-prd.md(supported types tables) - Skills —
commands/eforge/scenario.md(event type list + examples), other skills as relevant - Validation —
src/evidenceforge/validation/schema.py(event type sets, OS-gating, expansion redundancy) - Evaluation —
src/evidenceforge/evaluation/dimensions/(signal_integrity, noise_realism),src/evidenceforge/evaluation/rules/(co_occurrence.yaml, causal_pairs.yaml) - Causal expansion —
src/evidenceforge/generation/causal/rules.py(if the new event type needs auto-generated prerequisites),src/evidenceforge/generation/causal/registry.py(register new rules) - Coverage test prompt —
scenarios/COVERAGE-TEST-PROMPT.md(event type count, storyline steps, format-specific verification sections)
- Use Typer with
Annotatedtype hints for all options/arguments - Use Rich for progress bars (SpinnerColumn, BarColumn, TimeElapsedColumn, TimeRemainingColumn)
- Handle
KeyboardInterrupt→ exit code 130
Exit codes (per PRD spec):
| Code | Category | Description |
|---|---|---|
| 0 | Success | Operation completed successfully |
| 1 | Input Error | Malformed YAML or file I/O error |
| 2 | Schema Validation | Pydantic validation failure |
| 21 | Generation Error | Invalid state or unrecoverable generation failure |
| 22 | Format Error | Format definition loading/validation error |
| 130 | SIGINT | User interrupted (Ctrl+C) |
Organization: tests/unit/ (fast, no I/O), tests/integration/ (file I/O OK), tests/fixtures/ (shared data)
Coverage targets: the enforced release gate is 70% minimum. Aspirational
targets are 95%+ overall, 95%+ core engine, 90%+ formats, and 85%+ CLI. Exclude:
__main__.py, type stubs, test fixtures.
Default validation: run uv run pytest --no-cov for normal development and
feature PRs. Run the explicit coverage command only for release readiness before
opening or updating a dev → main PR. Do not combine --include-slow with
coverage during release validation.
Conventions:
- Test naming:
test_<function>_<scenario>_<expected_result> - Use Arrange/Act/Assert pattern
- Use
tmp_pathfor all file I/O in tests - No LLM calls during generation — all tests are deterministic
- Write deterministic tests: seed randomness, mock time, use fixed test data
- Use Hypothesis for property-based testing where appropriate (e.g., unique PIDs)
- Never use mutable default arguments
Claude Code Skills handle the interactive, creative aspects of scenario creation.
Location: commands/eforge/ directory
Skills:
/eforge scenario— Guided scenario creation through a structured interview, producing a validated YAML scenario file/eforge generate— Generation workflow that validates a scenario and runs the deterministic engine/eforge validate— Validate a scenario file for schema correctness and cross-reference integrity/eforge evaluate— Run data quality evaluation on generated output
Skills are markdown prompt files (.md), not Python code. They run inside Claude Code, not inside the eforge CLI process. They follow a hybrid interview pattern (structured questions first, then free-form refinement) and reference docs/reference/scenario-reference.md for schema validity.
Important: When modifying the scenario schema (adding/removing/changing fields in Pydantic models or docs/reference/scenario-reference.md), always update the corresponding skills in commands/eforge/ to reflect the changes — especially scenario.md (YAML templates and validation rules) and validate.md (error handling guidance).
- Create
commands/eforge/{name}.mdwith the skill prompt - Follow the hybrid interview pattern
- Reference
docs/reference/scenario-reference.mdfor output validity - Test interactively in Claude Code
- Update
install-skillscommand if needed
- Sysmon Event 22 shows svchost.exe, not the originating process: Windows DNS Client service (dnscache, hosted by svchost.exe) proxies all application DNS queries via DnsQuery_A()/getaddrinfo(). Real Sysmon Event 22 attributes to svchost.exe — this is correct Windows behavior, not a data loss bug.
- StrictUndefined removed from Jinja templates: Intentional (commit 5a4e7db). Templates use
| default(...)for optional fields. SandboxedEnvironment remains for SSTI protection. Template completeness tests intest_sysmon_new_events.pycatch variable name typos for required fields. - Overwrite swap uses backup-restore transaction: Staging directory protects against generation failures. The final swap backs up old output, installs new output, and restores the backup on any failure (including KeyboardInterrupt). Old GROUND_TRUTH.md + new data would be semantically invalid, so the swap is all-or-nothing — both files are always kept as a matched pair.
Key docs:
- Full PRD:
docs/design/PRD.md - Event model design:
docs/design/event-model-prd.md - Scenario schema:
docs/reference/scenario-reference.md - Evidence formats:
docs/reference/EVIDENCE_FORMATS.md - Data quality:
docs/design/data-quality-prd.md