|
| 1 | +--- |
| 2 | +description: IAST (Interactive Application Security Testing) - How it works and development guidelines |
| 3 | +globs: |
| 4 | + - "**/appsec/_iast/**" |
| 5 | + - "**/appsec/iast/**" |
| 6 | + - "**/tests/appsec/iast/**" |
| 7 | + - "**/tests/appsec/iast_aggregated_memcheck/**" |
| 8 | + - "**/tests/appsec/iast_memcheck/**" |
| 9 | + - "**/tests/appsec/iast_packages/**" |
| 10 | + - "**/tests/appsec/iast_tdd_propagation/**" |
| 11 | + - "**/tests/appsec/integrations/**" |
| 12 | +--- |
| 13 | + |
| 14 | +# IAST Development Guide |
| 15 | + |
| 16 | +**Note:** General development patterns, code style, and testing guidelines are in `AGENTS.md` under "IAST/AppSec Development". |
| 17 | + |
| 18 | +**Synonyms:** IAST = Code Security = Runtime Code Analysis (all refer to the same product) |
| 19 | + |
| 20 | +## What is IAST? |
| 21 | + |
| 22 | +IAST (Interactive Application Security Testing) analyzes running Python applications for security vulnerabilities by: |
| 23 | +1. **Taint tracking** - Performing taint analysis on data from untrusted sources (user input, HTTP requests, etc.) |
| 24 | +2. **Detecting vulnerabilities** - Identifying when tainted data reaches security-sensitive functions (sinks) |
| 25 | + |
| 26 | +**Official Documentation**: |
| 27 | +- [Datadog Code Security (IAST) Overview](https://docs.datadoghq.com/security/code_security/iast/) |
| 28 | +- [Getting Started with Code Security](https://docs.datadoghq.com/security/code_security/iast/setup/) |
| 29 | +- [Understanding Vulnerability Types](https://docs.datadoghq.com/security/code_security/iast/troubleshooting/) |
| 30 | + |
| 31 | +## How IAST Works: High-Level Architecture |
| 32 | + |
| 33 | +### 1. AST Patching (Code Instrumentation) |
| 34 | + |
| 35 | +IAST modifies Python bytecode at import time using AST (Abstract Syntax Tree) patching: |
| 36 | + |
| 37 | +- **Module Watchdog**: Hooks into Python's import system (`ddtrace.internal.module.ModuleWatchdog`) |
| 38 | +- **AST Visitor**: Analyzes and modifies Python AST before compilation (`ddtrace/appsec/_iast/_ast/visitor.py`) |
| 39 | +- **String Operations**: Patches string operations (concat, slice, format, etc.) to propagate taint |
| 40 | +- **Call Sites**: Instruments function calls to track taint flow |
| 41 | + |
| 42 | +**Location**: `ddtrace/appsec/_iast/_ast/` |
| 43 | +- `ast_patching.py` - Main AST patching logic |
| 44 | +- `visitor.py` - AST visitor for code transformation |
| 45 | +- `iastpatch.c` - C extension for fast AST manipulation |
| 46 | + |
| 47 | +**Activation**: Via `ModuleWatchdog.register_pre_exec_module_hook()` in `ddtrace/appsec/_iast/__init__.py` |
| 48 | + |
| 49 | +### 2. Taint Tracking (C++ Native Extension) |
| 50 | + |
| 51 | +Taint information is stored and propagated efficiently using a C++ native extension: |
| 52 | + |
| 53 | +- **TaintedObject**: Associates Python objects with taint metadata (source, ranges) |
| 54 | +- **Taint Ranges**: Track which parts of strings/bytes are tainted |
| 55 | +- **Context Management**: Per-request taint state using context-local storage |
| 56 | +- **Propagation Aspects**: Functions that propagate taint through operations |
| 57 | + |
| 58 | +**Location**: `ddtrace/appsec/_iast/_taint_tracking/` |
| 59 | +- Native C++ code compiled with CMake |
| 60 | +- `aspects.py` - Python API for taint propagation |
| 61 | +- `_native.cpython-*.so` - Compiled C++ extension |
| 62 | + |
| 63 | +**Key Concepts**: |
| 64 | +- **Taint Source**: Where untrusted data enters (HTTP params, headers, body) |
| 65 | +- **Taint Propagation**: Following data through operations (concat, slice, replace, etc.) |
| 66 | +- **Taint Range**: Start/end positions in strings that are tainted |
| 67 | + |
| 68 | +### 3. Module Patching (Taint Sinks) |
| 69 | + |
| 70 | +IAST wraps security-sensitive functions to detect vulnerabilities: |
| 71 | + |
| 72 | +- **IASTFunction**: Wraps target functions using `wrapt` library |
| 73 | +- **Taint Sinks**: Security-sensitive functions (exec, eval, SQL, file operations, etc.) |
| 74 | +- **Vulnerability Detection**: Checks if tainted data reaches sinks |
| 75 | + |
| 76 | +**Location**: `ddtrace/appsec/_iast/_patch_modules.py` |
| 77 | + |
| 78 | +**Supported Vulnerability Types** (`ddtrace/appsec/_iast/taint_sinks/`): |
| 79 | +- `sql_injection.py` |
| 80 | +- `command_injection.py` |
| 81 | +- `path_traversal.py` |
| 82 | +- `ssrf.py` |
| 83 | +- `code_injection.py` |
| 84 | +- `header_injection.py` |
| 85 | +- `weak_hash.py` |
| 86 | +- `weak_cipher.py` |
| 87 | +- `weak_randomness.py` |
| 88 | +- `insecure_cookie.py` |
| 89 | +- `unvalidated_redirect.py` |
| 90 | +- `untrusted_serialization.py` |
| 91 | + |
| 92 | +### 4. Overhead Control Engine (OCE) |
| 93 | + |
| 94 | +Performance optimization to limit IAST overhead: |
| 95 | + |
| 96 | +- **Request Sampling**: Analyze only X% of requests (default: 30%) |
| 97 | +- **Vulnerability Limits**: Max vulnerabilities per request |
| 98 | +- **Concurrent Request Limits**: Max requests analyzed simultaneously |
| 99 | +- **Per-Vulnerability Quotas**: Limit overhead per vulnerability type |
| 100 | + |
| 101 | +**Location**: `ddtrace/appsec/_iast/_overhead_control_engine.py` |
| 102 | + |
| 103 | +**Configuration**: Settings defined in `ddtrace/internal/settings/asm.py` (e.g., `_iast_request_sampling`, `_iast_sink_points_enabled`) |
| 104 | + |
| 105 | +### 5. Vulnerability Reporting |
| 106 | + |
| 107 | +When a vulnerability is detected: |
| 108 | +1. Evidence is collected (tainted data, location, stack trace) |
| 109 | +2. Vulnerability is reported via the tracer span |
| 110 | +3. Deduplication prevents duplicate reports |
| 111 | +4. Data is sent to Datadog backend for analysis |
| 112 | + |
| 113 | +## Key IAST Concepts for Development |
| 114 | + |
| 115 | +### Taint Tracking Terminology |
| 116 | + |
| 117 | +- **Taint Sources** (Origins): Where untrusted data enters the application (HTTP params, headers, body, cookies) |
| 118 | +- **Taint Propagation**: How tainted data flows through string operations (concat, slice, replace, format, etc.) |
| 119 | +- **Taint Ranges**: Specific byte/character offsets within strings that are tainted (start position + length) |
| 120 | +- **Sink Points**: Security-sensitive functions where vulnerabilities are detected (SQL execute, OS commands, file operations, eval/exec) |
| 121 | +- **Update Origins**: Adding or modifying taint source information to track data lineage |
| 122 | + |
| 123 | +### Call Site Instrumentation |
| 124 | + |
| 125 | +IAST uses **Call Site Instrumentation** (CSI) instead of traditional callee instrumentation: |
| 126 | +- Modifies calls to target functions rather than the functions themselves |
| 127 | +- Enables selective instrumentation based on context (e.g., skip internal JVM/framework calls) |
| 128 | +- Reduces overhead by instrumenting only application code, not low-level library internals |
| 129 | + |
| 130 | +### Tainted Ranges and Offsets |
| 131 | + |
| 132 | +Ranges track which parts of strings contain tainted data: |
| 133 | +- **Offset**: Starting position of tainted substring (encoding-dependent: UTF-16, Unicode code points, or bytes) |
| 134 | +- **Length**: Size of tainted region |
| 135 | +- **Source**: Reference to the origin of the tainted data |
| 136 | +- Used in vulnerability evidence to highlight exactly which user input caused the issue |
| 137 | + |
| 138 | +### Security Controls (Validators & Sanitizers) |
| 139 | + |
| 140 | +User-configurable validation/sanitization functions that apply **secure marks** to tainted ranges: |
| 141 | +- **Input Validators**: Check if input is safe, apply marks to input arguments |
| 142 | +- **Sanitizers**: Transform input to make it safe, apply marks to return value |
| 143 | +- **Secure Marks**: Flags indicating a range is safe for specific vulnerability types |
| 144 | +- If all ranges reaching a sink have appropriate secure marks, the vulnerability is suppressed |
| 145 | + |
| 146 | +### Vulnerability Detection Flow |
| 147 | + |
| 148 | +1. **Taint data at sources** - Mark HTTP request data with origin information |
| 149 | +2. **Propagate through operations** - Track tainted ranges through string manipulations via aspects |
| 150 | +3. **Check at sink points** - When tainted data reaches a vulnerable function, report if not secured |
| 151 | +4. **Apply overhead controls** - Request sampling, vulnerability quotas, and deduplication limit impact |
| 152 | + |
| 153 | +### Implementation References |
| 154 | + |
| 155 | +- **Taint Sinks**: `ddtrace/appsec/_iast/taint_sinks/` - Each file handles a specific vulnerability type |
| 156 | +- **Aspects**: `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Propagation functions for string operations |
| 157 | +- **Patch Modules**: `ddtrace/appsec/_iast/_patch_modules.py` - Registry of instrumented sink points |
| 158 | +- **Vulnerability Base**: `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base class for all vulnerability types |
| 159 | + |
| 160 | +## Important Technical Details |
| 161 | + |
| 162 | +### Flask Applications |
| 163 | + |
| 164 | +Flask apps need special patching for main module instrumentation: |
| 165 | + |
| 166 | +```python |
| 167 | +from ddtrace.appsec._iast import ddtrace_iast_flask_patch |
| 168 | + |
| 169 | +if __name__ == "__main__": |
| 170 | + ddtrace_iast_flask_patch() # Call before app.run() |
| 171 | + app.run() |
| 172 | +``` |
| 173 | + |
| 174 | +This patches the main Flask app file so IAST works on functions defined in `app.py`. |
| 175 | + |
| 176 | +### Gevent Compatibility |
| 177 | + |
| 178 | +IMPORTANT: Avoid top-level `import inspect` in IAST code - it interferes with gevent's monkey patching and causes sporadic worker timeouts in Gunicorn applications. |
| 179 | + |
| 180 | +**Solution**: Import `inspect` locally within functions when needed. |
| 181 | + |
| 182 | +### Native Code Development |
| 183 | + |
| 184 | +When working with IAST's C++ taint tracking code: |
| 185 | + |
| 186 | +1. **Prefer**: Native C++ types (`std::string`, `int`, `char`) |
| 187 | +2. **If needed**: CPython API with `PyObject*` (careful with reference counting!) |
| 188 | +3. **Last resort**: Pybind11 (adds complexity) |
| 189 | + |
| 190 | +**Build & Test C++ Code**: |
| 191 | +```bash |
| 192 | +cmake -DCMAKE_BUILD_TYPE=Debug -DPYTHON_EXECUTABLE=python \ |
| 193 | + -S ddtrace/appsec/_iast/_taint_tracking \ |
| 194 | + -B ddtrace/appsec/_iast/_taint_tracking |
| 195 | + |
| 196 | +make -f ddtrace/appsec/_iast/_taint_tracking/tests/Makefile native_tests |
| 197 | +ddtrace/appsec/_iast/_taint_tracking/tests/native_tests |
| 198 | +``` |
| 199 | + |
| 200 | +## Testing IAST Code |
| 201 | + |
| 202 | +### Python Tests |
| 203 | + |
| 204 | +```bash |
| 205 | +# Run IAST tests |
| 206 | +python -m pytest -vv -s --no-cov tests/appsec/iast/ |
| 207 | + |
| 208 | +# Run specific vulnerability tests |
| 209 | +python -m pytest -vv tests/appsec/iast/taint_sinks/test_sql_injection.py |
| 210 | + |
| 211 | +# Run with IAST enabled |
| 212 | +DD_IAST_ENABLED=true python -m pytest tests/appsec/iast/ |
| 213 | +``` |
| 214 | + |
| 215 | +### End-to-End Tests |
| 216 | + |
| 217 | +E2E tests use test servers defined in `tests/appsec/appsec_utils.py`: |
| 218 | +- `django_server` - Django test application |
| 219 | +- `flask_server` - Flask test application |
| 220 | +- `fast_api` - FastAPI test application |
| 221 | + |
| 222 | +Test application location: `tests/appsec/integrations/django_tests/django_app` |
| 223 | + |
| 224 | +**Running E2E tests**: |
| 225 | +```bash |
| 226 | +# Start testagent |
| 227 | +docker compose up -d testagent |
| 228 | + |
| 229 | +# Run E2E tests |
| 230 | +python -m pytest tests/appsec/iast/test_integration.py -v |
| 231 | +``` |
| 232 | + |
| 233 | +### C++ Native Tests |
| 234 | + |
| 235 | +```bash |
| 236 | +# Build and run C++ tests |
| 237 | +./ddtrace/appsec/_iast/_taint_tracking/tests/native_tests |
| 238 | +``` |
| 239 | + |
| 240 | +## Key Files Reference |
| 241 | + |
| 242 | +**Core Implementation**: |
| 243 | +- `ddtrace/appsec/_iast/__init__.py` - Entry point, initialization, fork safety |
| 244 | +- `ddtrace/appsec/_iast/_overhead_control_engine.py` - Performance control (OCE) |
| 245 | +- `ddtrace/appsec/_iast/_patch_modules.py` - Module patching registry |
| 246 | + |
| 247 | +**AST Patching**: |
| 248 | +- `ddtrace/appsec/_iast/_ast/ast_patching.py` - AST transformation |
| 249 | +- `ddtrace/appsec/_iast/_ast/visitor.py` - AST visitor |
| 250 | +- `ddtrace/appsec/_iast/_loader.py` - Patched module execution |
| 251 | + |
| 252 | +**Taint Tracking**: |
| 253 | +- `ddtrace/appsec/_iast/_taint_tracking/` - C++ native taint tracking |
| 254 | +- `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Taint propagation API |
| 255 | + |
| 256 | +**Vulnerability Detection**: |
| 257 | +- `ddtrace/appsec/_iast/taint_sinks/` - All vulnerability detectors |
| 258 | +- `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base vulnerability class |
| 259 | + |
| 260 | +**Security Controls**: |
| 261 | +- `ddtrace/appsec/_iast/secure_marks/` - Validators and sanitizers |
| 262 | + |
| 263 | +## Environment Variables |
| 264 | + |
| 265 | +**Public Configuration**: All public IAST environment variables are documented in the [ddtrace Configuration Guide](https://ddtrace.readthedocs.io/en/stable/configuration.html#code-security). |
| 266 | + |
| 267 | +**Private/Internal Environment Variables** (for development and debugging): |
| 268 | + |
| 269 | +```bash |
| 270 | +# Enable debug-level taint propagation logging |
| 271 | +_DD_IAST_PROPAGATION_DEBUG=true |
| 272 | + |
| 273 | +# Enable IAST internal debug logging |
| 274 | +_DD_IAST_DEBUG=true |
| 275 | + |
| 276 | +# Enable specific taint sink detection (comma-separated list) |
| 277 | +_DD_IAST_SINK_POINTS_ENABLED=sql_injection,command_injection,path_traversal |
| 278 | + |
| 279 | +# Specify modules to patch for AST instrumentation |
| 280 | +_DD_IAST_PATCH_MODULES=benchmarks.,tests.appsec.,scripts.iast. |
| 281 | + |
| 282 | +# Fast build mode - skips some compilation optimizations (development only) |
| 283 | +DD_FAST_BUILD=1 |
| 284 | +``` |
| 285 | + |
| 286 | +**Note**: Private environment variables (prefixed with `_DD_`) are not officially supported and may change without notice. They are primarily for internal development and debugging. |
0 commit comments