Skip to content

Commit 90b4d18

Browse files
avara1986smola
andauthored
chore(agents): iast context (#15381)
Following up on and splitting out #15193 This PR adds IAST context for AI agent --------- Co-authored-by: Santiago M. Mola <[email protected]>
1 parent 777a339 commit 90b4d18

File tree

2 files changed

+287
-0
lines changed

2 files changed

+287
-0
lines changed

.cursor/rules/iast.mdc

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
---
2+
description: IAST (Interactive Application Security Testing) - How it works and development guidelines
3+
globs:
4+
- "**/appsec/_iast/**"
5+
- "**/appsec/iast/**"
6+
- "**/tests/appsec/iast/**"
7+
- "**/tests/appsec/iast_aggregated_memcheck/**"
8+
- "**/tests/appsec/iast_memcheck/**"
9+
- "**/tests/appsec/iast_packages/**"
10+
- "**/tests/appsec/iast_tdd_propagation/**"
11+
- "**/tests/appsec/integrations/**"
12+
---
13+
14+
# IAST Development Guide
15+
16+
**Note:** General development patterns, code style, and testing guidelines are in `AGENTS.md` under "IAST/AppSec Development".
17+
18+
**Synonyms:** IAST = Code Security = Runtime Code Analysis (all refer to the same product)
19+
20+
## What is IAST?
21+
22+
IAST (Interactive Application Security Testing) analyzes running Python applications for security vulnerabilities by:
23+
1. **Taint tracking** - Performing taint analysis on data from untrusted sources (user input, HTTP requests, etc.)
24+
2. **Detecting vulnerabilities** - Identifying when tainted data reaches security-sensitive functions (sinks)
25+
26+
**Official Documentation**:
27+
- [Datadog Code Security (IAST) Overview](https://docs.datadoghq.com/security/code_security/iast/)
28+
- [Getting Started with Code Security](https://docs.datadoghq.com/security/code_security/iast/setup/)
29+
- [Understanding Vulnerability Types](https://docs.datadoghq.com/security/code_security/iast/troubleshooting/)
30+
31+
## How IAST Works: High-Level Architecture
32+
33+
### 1. AST Patching (Code Instrumentation)
34+
35+
IAST modifies Python bytecode at import time using AST (Abstract Syntax Tree) patching:
36+
37+
- **Module Watchdog**: Hooks into Python's import system (`ddtrace.internal.module.ModuleWatchdog`)
38+
- **AST Visitor**: Analyzes and modifies Python AST before compilation (`ddtrace/appsec/_iast/_ast/visitor.py`)
39+
- **String Operations**: Patches string operations (concat, slice, format, etc.) to propagate taint
40+
- **Call Sites**: Instruments function calls to track taint flow
41+
42+
**Location**: `ddtrace/appsec/_iast/_ast/`
43+
- `ast_patching.py` - Main AST patching logic
44+
- `visitor.py` - AST visitor for code transformation
45+
- `iastpatch.c` - C extension for fast AST manipulation
46+
47+
**Activation**: Via `ModuleWatchdog.register_pre_exec_module_hook()` in `ddtrace/appsec/_iast/__init__.py`
48+
49+
### 2. Taint Tracking (C++ Native Extension)
50+
51+
Taint information is stored and propagated efficiently using a C++ native extension:
52+
53+
- **TaintedObject**: Associates Python objects with taint metadata (source, ranges)
54+
- **Taint Ranges**: Track which parts of strings/bytes are tainted
55+
- **Context Management**: Per-request taint state using context-local storage
56+
- **Propagation Aspects**: Functions that propagate taint through operations
57+
58+
**Location**: `ddtrace/appsec/_iast/_taint_tracking/`
59+
- Native C++ code compiled with CMake
60+
- `aspects.py` - Python API for taint propagation
61+
- `_native.cpython-*.so` - Compiled C++ extension
62+
63+
**Key Concepts**:
64+
- **Taint Source**: Where untrusted data enters (HTTP params, headers, body)
65+
- **Taint Propagation**: Following data through operations (concat, slice, replace, etc.)
66+
- **Taint Range**: Start/end positions in strings that are tainted
67+
68+
### 3. Module Patching (Taint Sinks)
69+
70+
IAST wraps security-sensitive functions to detect vulnerabilities:
71+
72+
- **IASTFunction**: Wraps target functions using `wrapt` library
73+
- **Taint Sinks**: Security-sensitive functions (exec, eval, SQL, file operations, etc.)
74+
- **Vulnerability Detection**: Checks if tainted data reaches sinks
75+
76+
**Location**: `ddtrace/appsec/_iast/_patch_modules.py`
77+
78+
**Supported Vulnerability Types** (`ddtrace/appsec/_iast/taint_sinks/`):
79+
- `sql_injection.py`
80+
- `command_injection.py`
81+
- `path_traversal.py`
82+
- `ssrf.py`
83+
- `code_injection.py`
84+
- `header_injection.py`
85+
- `weak_hash.py`
86+
- `weak_cipher.py`
87+
- `weak_randomness.py`
88+
- `insecure_cookie.py`
89+
- `unvalidated_redirect.py`
90+
- `untrusted_serialization.py`
91+
92+
### 4. Overhead Control Engine (OCE)
93+
94+
Performance optimization to limit IAST overhead:
95+
96+
- **Request Sampling**: Analyze only X% of requests (default: 30%)
97+
- **Vulnerability Limits**: Max vulnerabilities per request
98+
- **Concurrent Request Limits**: Max requests analyzed simultaneously
99+
- **Per-Vulnerability Quotas**: Limit overhead per vulnerability type
100+
101+
**Location**: `ddtrace/appsec/_iast/_overhead_control_engine.py`
102+
103+
**Configuration**: Settings defined in `ddtrace/internal/settings/asm.py` (e.g., `_iast_request_sampling`, `_iast_sink_points_enabled`)
104+
105+
### 5. Vulnerability Reporting
106+
107+
When a vulnerability is detected:
108+
1. Evidence is collected (tainted data, location, stack trace)
109+
2. Vulnerability is reported via the tracer span
110+
3. Deduplication prevents duplicate reports
111+
4. Data is sent to Datadog backend for analysis
112+
113+
## Key IAST Concepts for Development
114+
115+
### Taint Tracking Terminology
116+
117+
- **Taint Sources** (Origins): Where untrusted data enters the application (HTTP params, headers, body, cookies)
118+
- **Taint Propagation**: How tainted data flows through string operations (concat, slice, replace, format, etc.)
119+
- **Taint Ranges**: Specific byte/character offsets within strings that are tainted (start position + length)
120+
- **Sink Points**: Security-sensitive functions where vulnerabilities are detected (SQL execute, OS commands, file operations, eval/exec)
121+
- **Update Origins**: Adding or modifying taint source information to track data lineage
122+
123+
### Call Site Instrumentation
124+
125+
IAST uses **Call Site Instrumentation** (CSI) instead of traditional callee instrumentation:
126+
- Modifies calls to target functions rather than the functions themselves
127+
- Enables selective instrumentation based on context (e.g., skip internal JVM/framework calls)
128+
- Reduces overhead by instrumenting only application code, not low-level library internals
129+
130+
### Tainted Ranges and Offsets
131+
132+
Ranges track which parts of strings contain tainted data:
133+
- **Offset**: Starting position of tainted substring (encoding-dependent: UTF-16, Unicode code points, or bytes)
134+
- **Length**: Size of tainted region
135+
- **Source**: Reference to the origin of the tainted data
136+
- Used in vulnerability evidence to highlight exactly which user input caused the issue
137+
138+
### Security Controls (Validators & Sanitizers)
139+
140+
User-configurable validation/sanitization functions that apply **secure marks** to tainted ranges:
141+
- **Input Validators**: Check if input is safe, apply marks to input arguments
142+
- **Sanitizers**: Transform input to make it safe, apply marks to return value
143+
- **Secure Marks**: Flags indicating a range is safe for specific vulnerability types
144+
- If all ranges reaching a sink have appropriate secure marks, the vulnerability is suppressed
145+
146+
### Vulnerability Detection Flow
147+
148+
1. **Taint data at sources** - Mark HTTP request data with origin information
149+
2. **Propagate through operations** - Track tainted ranges through string manipulations via aspects
150+
3. **Check at sink points** - When tainted data reaches a vulnerable function, report if not secured
151+
4. **Apply overhead controls** - Request sampling, vulnerability quotas, and deduplication limit impact
152+
153+
### Implementation References
154+
155+
- **Taint Sinks**: `ddtrace/appsec/_iast/taint_sinks/` - Each file handles a specific vulnerability type
156+
- **Aspects**: `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Propagation functions for string operations
157+
- **Patch Modules**: `ddtrace/appsec/_iast/_patch_modules.py` - Registry of instrumented sink points
158+
- **Vulnerability Base**: `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base class for all vulnerability types
159+
160+
## Important Technical Details
161+
162+
### Flask Applications
163+
164+
Flask apps need special patching for main module instrumentation:
165+
166+
```python
167+
from ddtrace.appsec._iast import ddtrace_iast_flask_patch
168+
169+
if __name__ == "__main__":
170+
ddtrace_iast_flask_patch() # Call before app.run()
171+
app.run()
172+
```
173+
174+
This patches the main Flask app file so IAST works on functions defined in `app.py`.
175+
176+
### Gevent Compatibility
177+
178+
IMPORTANT: Avoid top-level `import inspect` in IAST code - it interferes with gevent's monkey patching and causes sporadic worker timeouts in Gunicorn applications.
179+
180+
**Solution**: Import `inspect` locally within functions when needed.
181+
182+
### Native Code Development
183+
184+
When working with IAST's C++ taint tracking code:
185+
186+
1. **Prefer**: Native C++ types (`std::string`, `int`, `char`)
187+
2. **If needed**: CPython API with `PyObject*` (careful with reference counting!)
188+
3. **Last resort**: Pybind11 (adds complexity)
189+
190+
**Build & Test C++ Code**:
191+
```bash
192+
cmake -DCMAKE_BUILD_TYPE=Debug -DPYTHON_EXECUTABLE=python \
193+
-S ddtrace/appsec/_iast/_taint_tracking \
194+
-B ddtrace/appsec/_iast/_taint_tracking
195+
196+
make -f ddtrace/appsec/_iast/_taint_tracking/tests/Makefile native_tests
197+
ddtrace/appsec/_iast/_taint_tracking/tests/native_tests
198+
```
199+
200+
## Testing IAST Code
201+
202+
### Python Tests
203+
204+
```bash
205+
# Run IAST tests
206+
python -m pytest -vv -s --no-cov tests/appsec/iast/
207+
208+
# Run specific vulnerability tests
209+
python -m pytest -vv tests/appsec/iast/taint_sinks/test_sql_injection.py
210+
211+
# Run with IAST enabled
212+
DD_IAST_ENABLED=true python -m pytest tests/appsec/iast/
213+
```
214+
215+
### End-to-End Tests
216+
217+
E2E tests use test servers defined in `tests/appsec/appsec_utils.py`:
218+
- `django_server` - Django test application
219+
- `flask_server` - Flask test application
220+
- `fast_api` - FastAPI test application
221+
222+
Test application location: `tests/appsec/integrations/django_tests/django_app`
223+
224+
**Running E2E tests**:
225+
```bash
226+
# Start testagent
227+
docker compose up -d testagent
228+
229+
# Run E2E tests
230+
python -m pytest tests/appsec/iast/test_integration.py -v
231+
```
232+
233+
### C++ Native Tests
234+
235+
```bash
236+
# Build and run C++ tests
237+
./ddtrace/appsec/_iast/_taint_tracking/tests/native_tests
238+
```
239+
240+
## Key Files Reference
241+
242+
**Core Implementation**:
243+
- `ddtrace/appsec/_iast/__init__.py` - Entry point, initialization, fork safety
244+
- `ddtrace/appsec/_iast/_overhead_control_engine.py` - Performance control (OCE)
245+
- `ddtrace/appsec/_iast/_patch_modules.py` - Module patching registry
246+
247+
**AST Patching**:
248+
- `ddtrace/appsec/_iast/_ast/ast_patching.py` - AST transformation
249+
- `ddtrace/appsec/_iast/_ast/visitor.py` - AST visitor
250+
- `ddtrace/appsec/_iast/_loader.py` - Patched module execution
251+
252+
**Taint Tracking**:
253+
- `ddtrace/appsec/_iast/_taint_tracking/` - C++ native taint tracking
254+
- `ddtrace/appsec/_iast/_taint_tracking/aspects.py` - Taint propagation API
255+
256+
**Vulnerability Detection**:
257+
- `ddtrace/appsec/_iast/taint_sinks/` - All vulnerability detectors
258+
- `ddtrace/appsec/_iast/taint_sinks/_base.py` - Base vulnerability class
259+
260+
**Security Controls**:
261+
- `ddtrace/appsec/_iast/secure_marks/` - Validators and sanitizers
262+
263+
## Environment Variables
264+
265+
**Public Configuration**: All public IAST environment variables are documented in the [ddtrace Configuration Guide](https://ddtrace.readthedocs.io/en/stable/configuration.html#code-security).
266+
267+
**Private/Internal Environment Variables** (for development and debugging):
268+
269+
```bash
270+
# Enable debug-level taint propagation logging
271+
_DD_IAST_PROPAGATION_DEBUG=true
272+
273+
# Enable IAST internal debug logging
274+
_DD_IAST_DEBUG=true
275+
276+
# Enable specific taint sink detection (comma-separated list)
277+
_DD_IAST_SINK_POINTS_ENABLED=sql_injection,command_injection,path_traversal
278+
279+
# Specify modules to patch for AST instrumentation
280+
_DD_IAST_PATCH_MODULES=benchmarks.,tests.appsec.,scripts.iast.
281+
282+
# Fast build mode - skips some compilation optimizations (development only)
283+
DD_FAST_BUILD=1
284+
```
285+
286+
**Note**: Private environment variables (prefixed with `_DD_`) are not officially supported and may change without notice. They are primarily for internal development and debugging.

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ tests/internal/symbol_db/ @DataDog/debugger-python
112112
.gitlab/tests/debugging.yml @DataDog/debugger-python
113113

114114
# ASM
115+
.cursor/rules/iast.mdc @DataDog/asm-python
115116
.gitlab/tests/appsec.yml @DataDog/asm-python
116117
benchmarks/appsec* @DataDog/asm-python
117118
benchmarks/bm/iast_utils* @DataDog/asm-python

0 commit comments

Comments
 (0)