chore(agents): iast context (#15381)

avara1986 · smola · web-flow · commit 90b4d182120c · 2025-11-25T10:31:56.000+01:00
Following up on and splitting out #15193 This PR adds IAST context for AI agent --------- Co-authored-by: Santiago M. Mola <santiago.mola@datadoghq.com>
diff --git a/.cursor/rules/iast.mdc b/.cursor/rules/iast.mdc
@@ -0,0 +1,286 @@
+---
+description: IAST (Interactive Application Security Testing) - How it works and development guidelines
+globs:
+  - "**/appsec/_iast/**"
+  - "**/appsec/iast/**"
+  - "**/tests/appsec/iast/**"
+  - "**/tests/appsec/iast_aggregated_memcheck/**"
+  - "**/tests/appsec/iast_memcheck/**"
+  - "**/tests/appsec/iast_packages/**"
+  - "**/tests/appsec/iast_tdd_propagation/**"
+  - "**/tests/appsec/integrations/**"
+---
+
+# IAST Development Guide
+
+**Note:** General development patterns, code style, and testing guidelines are in `AGENTS.md` under "IAST/AppSec Development".
+
+**Synonyms:** IAST = Code Security = Runtime Code Analysis (all refer to the same product)
+
+## What is IAST?
+
+IAST (Interactive Application Security Testing) analyzes running Python applications for security vulnerabilities by:
+1. **Taint tracking** - Performing taint analysis on data from untrusted sources (user input, HTTP requests, etc.)
+2. **Detecting vulnerabilities** - Identifying when tainted data reaches security-sensitive functions (sinks)
+
+**Official Documentation**: 
+- [Datadog Code Security (IAST) Overview](https://docs.datadoghq.com/security/code_security/iast/)
+- [Getting Started with Code Security](https://docs.datadoghq.com/security/code_security/iast/setup/)
+- [Understanding Vulnerability Types](https://docs.datadoghq.com/security/code_security/iast/troubleshooting/)
+
+## How IAST Works: High-Level Architecture
+
+### 1. AST Patching (Code Instrumentation)
+
+IAST modifies Python bytecode at import time using AST (Abstract Syntax Tree) patching:
+
+- **Module Watchdog**: Hooks into Python's import system (`ddtrace.internal.module.ModuleWatchdog`)
+- **AST Visitor**: Analyzes and modifies Python AST before compilation (`ddtrace/appsec/_iast/_ast/visitor.py`)
+- **String Operations**: Patches string operations (concat, slice, format, etc.) to propagate taint
+- **Call Sites**: Instruments function calls to track taint flow
+
+**Location**: `ddtrace/appsec/_iast/_ast/`
+- `ast_patching.py` - Main AST patching logic
+- `visitor.py` - AST visitor for code transformation
+- `iastpatch.c` - C extension for fast AST manipulation
+
+**Activation**: Via `ModuleWatchdog.register_pre_exec_module_hook()` in `ddtrace/appsec/_iast/__init__.py`
+
+### 2. Taint Tracking (C++ Native Extension)
+
+Taint information is stored and propagated efficiently using a C++ native extension:
+
+- **TaintedObject**: Associates Python objects with taint metadata (source, ranges)
+- **Taint Ranges**: Track which parts of strings/bytes are tainted
+- **Context Management**: Per-request taint state using context-local storage
+- **Propagation Aspects**: Functions that propagate taint through operations
+
+**Location**: `ddtrace/appsec/_iast/_taint_tracking/`
+- Native C++ code compiled with CMake
+- `aspects.py` - Python API for taint propagation
+- `_native.cpython-*.so` - Compiled C++ extension
+
+**Key Concepts**:
+- **Taint Source**: Where untrusted data enters (HTTP params, headers, body)
+- **Taint Propagation**: Following data through operations (concat, slice, replace, etc.)
+- **Taint Range**: Start/end positions in strings that are tainted
+
+### 3. Module Patching (Taint Sinks)
+
+IAST wraps security-sensitive functions to detect vulnerabilities:
+
+- **IASTFunction**: Wraps target functions using `wrapt` library
+- **Taint Sinks**: Security-sensitive functions (exec, eval, SQL, file operations, etc.)
+- **Vulnerability Detection**: Checks if tainted data reaches sinks
+
+**Location**: `ddtrace/appsec/_iast/_patch_modules.py`
+
+**Supported Vulnerability Types** (`ddtrace/appsec/_iast/taint_sinks/`):
+- `sql_injection.py`
+- `command_injection.py`
+- `path_traversal.py`
+- `ssrf.py`
+- `code_injection.py`
+- `header_injection.py`
+- `weak_hash.py`
+- `weak_cipher.py`
+- `weak_randomness.py`
+- `insecure_cookie.py`
+- `unvalidated_redirect.py`
+- `untrusted_serialization.py`
+
+### 4. Overhead Control Engine (OCE)
+
+Performance optimization to limit IAST overhead:
+
+- **Request Sampling**: Analyze only X% of requests (default: 30%)
+- **Vulnerability Limits**: Max vulnerabilities per request
+- **Concurrent Request Limits**: Max requests analyzed simultaneously
+- **Per-Vulnerability Quotas**: Limit overhead per vulnerability type
+
+**Location**: `ddtrace/appsec/_iast/_overhead_control_engine.py`
+
+**Configuration**: Settings defined in `ddtrace/internal/settings/asm.py` (e.g., `_iast_request_sampling`, `_iast_sink_points_enabled`)
+
+### 5. Vulnerability Reporting
+
+When a vulnerability is detected:
+1. Evidence is collected (tainted data, location, stack trace)
+2. Vulnerability is reported via the tracer span
+3. Deduplication prevents duplicate reports
+4. Data is sent to Datadog backend for analysis
+
+## Key IAST Concepts for Development
+
+### Taint Tracking Terminology
+
+- **Taint Sources** (Origins): Where untrusted data enters the application (HTTP params, headers, body, cookies)
+- **Taint Propagation**: How tainted data flows through string operations (concat, slice, replace, format, etc.)
+- **Taint Ranges**: Specific byte/character offsets within strings that are tainted (start position + length)
+- **Sink Points**: Security-sensitive functions where vulnerabilities are detected (SQL execute, OS commands, file operations, eval/exec)
+- **Update Origins**: Adding or modifying taint source information to track data lineage
+
+### Call Site Instrumentation
+
+IAST uses **Call Site Instrumentation** (CSI) instead of traditional callee instrumentation:
+- Modifies calls to target functions rather than the functions themselves
+- Enables selective instrumentation based on context (e.g., skip internal JVM/framework calls)
+- Reduces overhead by instrumenting only application code, not low-level library internals
+
+### Tainted Ranges and Offsets
+
+Ranges track which parts of strings contain tainted data:
+- **Offset**: Starting position of tainted substring (encoding-dependent: UTF-16, Unicode code points, or bytes)
+- **Length**: Size of tainted region
+- **Source**: Reference to the origin of the tainted data
+- Used in vulnerability evidence to highlight exactly which user input caused the issue
+
+### Security Controls (Validators & Sanitizers)
+
+User-configurable validation/sanitization functions that apply **secure marks** to tainted ranges:
+- **Input Validators**: Check if input is safe, apply marks to input arguments
+- **Sanitizers**: Transform input to make it safe, apply marks to return value
+- **Secure Marks**: Flags indicating a range is safe for specific vulnerability types
+- If all ranges reaching a sink have appropriate secure marks, the vulnerability is suppressed
+
+### Vulnerability Detection Flow
+
+1. **Taint data at sources** - Mark HTTP request data with origin information
+2. **Propagate through operations** - Track tainted ranges through string manipulations via aspects
+3. **Check at sink points** - When tainted data reaches a vulnerable function, report if not secured
+4. **Apply overhead controls** - Request sampling, vulnerability quotas, and deduplication limit impact
+
+### Implementation References
+
+- **Taint Sinks**: `ddtrace/appsec/_iast/taint_sinks/` - Each file handles a specific vulnerability type
+- **Aspects**: `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Propagation functions for string operations
+- **Patch Modules**: `ddtrace/appsec/_iast/_patch_modules.py` - Registry of instrumented sink points
+- **Vulnerability Base**: `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base class for all vulnerability types
+
+## Important Technical Details
+
+### Flask Applications
+
+Flask apps need special patching for main module instrumentation:
+
+```python
+from ddtrace.appsec._iast import ddtrace_iast_flask_patch
+
+if __name__ == "__main__":
+    ddtrace_iast_flask_patch()  # Call before app.run()
+    app.run()
+```
+
+This patches the main Flask app file so IAST works on functions defined in `app.py`.
+
+### Gevent Compatibility
+
+IMPORTANT: Avoid top-level `import inspect` in IAST code - it interferes with gevent's monkey patching and causes sporadic worker timeouts in Gunicorn applications.
+
+**Solution**: Import `inspect` locally within functions when needed.
+
+### Native Code Development
+
+When working with IAST's C++ taint tracking code:
+
+1. **Prefer**: Native C++ types (`std::string`, `int`, `char`)
+2. **If needed**: CPython API with `PyObject*` (careful with reference counting!)
+3. **Last resort**: Pybind11 (adds complexity)
+
+**Build & Test C++ Code**:
+```bash
+cmake -DCMAKE_BUILD_TYPE=Debug -DPYTHON_EXECUTABLE=python \
+  -S ddtrace/appsec/_iast/_taint_tracking \
+  -B ddtrace/appsec/_iast/_taint_tracking
+
+make -f ddtrace/appsec/_iast/_taint_tracking/tests/Makefile native_tests
+ddtrace/appsec/_iast/_taint_tracking/tests/native_tests
+```
+
+## Testing IAST Code
+
+### Python Tests
+
+```bash
+# Run IAST tests
+python -m pytest -vv -s --no-cov tests/appsec/iast/
+
+# Run specific vulnerability tests
+python -m pytest -vv tests/appsec/iast/taint_sinks/test_sql_injection.py
+
+# Run with IAST enabled
+DD_IAST_ENABLED=true python -m pytest tests/appsec/iast/
+```
+
+### End-to-End Tests
+
+E2E tests use test servers defined in `tests/appsec/appsec_utils.py`:
+- `django_server` - Django test application
+- `flask_server` - Flask test application
+- `fast_api` - FastAPI test application
+
+Test application location: `tests/appsec/integrations/django_tests/django_app`
+
+**Running E2E tests**:
+```bash
+# Start testagent
+docker compose up -d testagent
+
+# Run E2E tests
+python -m pytest tests/appsec/iast/test_integration.py -v
+```
+
+### C++ Native Tests
+
+```bash
+# Build and run C++ tests
+./ddtrace/appsec/_iast/_taint_tracking/tests/native_tests
+```
+
+## Key Files Reference
+
+**Core Implementation**:
+- `ddtrace/appsec/_iast/__init__.py` - Entry point, initialization, fork safety
+- `ddtrace/appsec/_iast/_overhead_control_engine.py` - Performance control (OCE)
+- `ddtrace/appsec/_iast/_patch_modules.py` - Module patching registry
+
+**AST Patching**:
+- `ddtrace/appsec/_iast/_ast/ast_patching.py` - AST transformation
+- `ddtrace/appsec/_iast/_ast/visitor.py` - AST visitor
+- `ddtrace/appsec/_iast/_loader.py` - Patched module execution
+
+**Taint Tracking**:
+- `ddtrace/appsec/_iast/_taint_tracking/` - C++ native taint tracking
+- `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Taint propagation API
+
+**Vulnerability Detection**:
+- `ddtrace/appsec/_iast/taint_sinks/` - All vulnerability detectors
+- `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base vulnerability class
+
+**Security Controls**:
+- `ddtrace/appsec/_iast/secure_marks/` - Validators and sanitizers
+
+## Environment Variables
+
+**Public Configuration**: All public IAST environment variables are documented in the [ddtrace Configuration Guide](https://ddtrace.readthedocs.io/en/stable/configuration.html#code-security).
+
+**Private/Internal Environment Variables** (for development and debugging):
+
+```bash
+# Enable debug-level taint propagation logging
+_DD_IAST_PROPAGATION_DEBUG=true
+
+# Enable IAST internal debug logging
+_DD_IAST_DEBUG=true
+
+# Enable specific taint sink detection (comma-separated list)
+_DD_IAST_SINK_POINTS_ENABLED=sql_injection,command_injection,path_traversal
+
+# Specify modules to patch for AST instrumentation
+_DD_IAST_PATCH_MODULES=benchmarks.,tests.appsec.,scripts.iast.
+
+# Fast build mode - skips some compilation optimizations (development only)
+DD_FAST_BUILD=1
+```
+
+**Note**: Private environment variables (prefixed with `_DD_`) are not officially supported and may change without notice. They are primarily for internal development and debugging.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -112,6 +112,7 @@ tests/internal/symbol_db/           @DataDog/debugger-python
 .gitlab/tests/debugging.yml         @DataDog/debugger-python
 
 # ASM
+.cursor/rules/iast.mdc              @DataDog/asm-python
 .gitlab/tests/appsec.yml            @DataDog/asm-python
 benchmarks/appsec*                  @DataDog/asm-python
 benchmarks/bm/iast_utils*           @DataDog/asm-python