Skip to content

chore(dev): ai development rules #13681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .cursor/dd-trace-py.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# AI Guidelines dd-trace-py

## The Golden Rule
When unsure about implementation details, ALWAYS ask the developer.

## Project Context
`dd-trace-py` is the official Datadog library for tracing and monitoring Python applications. It provides developers with deep visibility into their application's performance by enabling distributed tracing, continuous profiling, and error tracking.

The library offers automatic instrumentation for a wide variety of popular Python frameworks and libraries, such as Django, Flask, Celery, and SQLAlchemy. This allows for quick setup and immediate insights with minimal code changes. For more specific use cases, `dd-trace-py` also provides a comprehensive manual instrumentation API.

Key features include:
- **Distributed Tracing:** Trace requests across service boundaries to understand the full lifecycle of a request in a microservices architecture.
- **Continuous Profiling:** Analyze code-level performance in production to identify and optimize CPU and memory-intensive operations.
- **Error Tracking:** Capture and aggregate unhandled exceptions and errors, linking them to specific traces for easier debugging.
- **Application Security Monitoring (ASM):** Detect and protect against threats and vulnerabilities within your application.

## Critical Architecture Decisions

### Automatic Instrumentation via Monkey-Patching
`dd-trace-py` heavily relies on monkey-patching for its automatic instrumentation. This allows the library to trace a wide range of standard libraries and frameworks without requiring any code changes from the user. This is a powerful feature that makes the library easy to adopt, but it's also a critical piece of architecture to be aware of when debugging or extending the tracer.

### Performance First
The tracer is designed to run in high-throughput production environments. This means performance is a primary concern. Key decisions to achieve this include:
1. **Core components written in C/Cython:** For performance-critical code paths.
2. **Low-overhead sampling:** To reduce the performance impact of tracing.
3. **Efficient serialization:** Using `msgpack` to communicate with the Datadog Agent.

### Extensible Integrations
The library is designed to be easily extensible with new "integrations" for libraries that are not yet supported. The integration system provides a clear pattern for adding new instrumentation.

### Configuration via Environment Variables
To make it easy to configure the tracer in various environments (local development, CI, production containers, etc.), the primary method of configuration is through environment variables.

## Code Style and Patterns

### Anchor comments

Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for.

### Guidelines:

- Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers.
- **Important:** Before scanning files, always first try to **grep for existing anchors** `AIDEV-*` in relevant subdirectories.
- **Update relevant anchors** when modifying associated code.
- **Do not remove `AIDEV-NOTE`s** without explicit human instruction.
- Make sure to add relevant anchor comments, whenever a file or piece of code is:
* too complex, or
* very important, or
* confusing, or
* could have a bug

## Domain Glossary

- **Trace**: A representation of a single request or transaction as it moves through a system. It's composed of one or more spans.
- **Span**: The basic building block of a trace. It represents a single unit of work (e.g., a function call, a database query). Spans have a start and end time, a name, a service, a resource, and can contain metadata (tags).
- **Name (`span.name`)**: A string that represents the name of the operation being measured. It should be a general, human-readable identifier for the type of operation, such as `http.request`, `sqlalchemy.query`, or a function name like `auth.login`. This is typically more general than the `resource` name.
- **Service**: A logical grouping of traces that perform the same function. For example, a web application or a database.
- **Resource**: The specific operation being performed within a service that a span represents. For example, a web endpoint (`/users/{id}`) or a database query (`SELECT * FROM users`).
- **Instrumentation**: The process of adding code to an application to generate traces. This can be done automatically by the library (`ddtrace-run`) or manually using the tracer API.
- **Integration**: A component of `dd-trace-py` that provides automatic instrumentation for a specific library or framework (e.g., Django, `requests`, `psycopg2`).
- **Tracer**: The object that creates and manages spans. It's the main entry point for manual instrumentation.
- **Profiler**: A component that collects data on CPU and memory usage at the code level, helping to identify performance bottlenecks.
- **`ddtrace-run`**: A command-line utility that automatically instruments a Python application by monkey-patching supported libraries at runtime.
- **Tag**: A key-value pair of metadata that can be added to a span to provide additional context for filtering, grouping, or analysis.

## What AI Must NEVER Do

1. **Never change public API contracts** - Breaks real applications
2. **Never commit secrets** - Use environment variables
3. **Never assume business logic** - Always ask
4. **Never remove AIDEV- comments** - They're there for a reason

Remember: We optimize for maintainability over cleverness.
When in doubt, choose the boring solution.
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,6 @@ ENV/
# VS Code
.vscode/

# Cursor
.cursor/

# Riot
.riot/venv*
.riot/requirements/*.in
Expand Down
74 changes: 74 additions & 0 deletions AI_GUIDELINES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# AI Guidelines dd-trace-py

## The Golden Rule
When unsure about implementation details, ALWAYS ask the developer.

## Project Context
`dd-trace-py` is the official Datadog library for tracing and monitoring Python applications. It provides developers with deep visibility into their application's performance by enabling distributed tracing, continuous profiling, and error tracking.

The library offers automatic instrumentation for a wide variety of popular Python frameworks and libraries, such as Django, Flask, Celery, and SQLAlchemy. This allows for quick setup and immediate insights with minimal code changes. For more specific use cases, `dd-trace-py` also provides a comprehensive manual instrumentation API.

Key features include:
- **Distributed Tracing:** Trace requests across service boundaries to understand the full lifecycle of a request in a microservices architecture.
- **Continuous Profiling:** Analyze code-level performance in production to identify and optimize CPU and memory-intensive operations.
- **Error Tracking:** Capture and aggregate unhandled exceptions and errors, linking them to specific traces for easier debugging.
- **Application Security Monitoring (ASM):** Detect and protect against threats and vulnerabilities within your application.

## Critical Architecture Decisions

### Automatic Instrumentation via Monkey-Patching
`dd-trace-py` heavily relies on monkey-patching for its automatic instrumentation. This allows the library to trace a wide range of standard libraries and frameworks without requiring any code changes from the user. This is a powerful feature that makes the library easy to adopt, but it's also a critical piece of architecture to be aware of when debugging or extending the tracer.

### Performance First
The tracer is designed to run in high-throughput production environments. This means performance is a primary concern. Key decisions to achieve this include:
1. **Core components written in C/Cython:** For performance-critical code paths.
2. **Low-overhead sampling:** To reduce the performance impact of tracing.
3. **Efficient serialization:** Using `msgpack` to communicate with the Datadog Agent.

### Extensible Integrations
The library is designed to be easily extensible with new "integrations" for libraries that are not yet supported. The integration system provides a clear pattern for adding new instrumentation.

### Configuration via Environment Variables
To make it easy to configure the tracer in various environments (local development, CI, production containers, etc.), the primary method of configuration is through environment variables.

## Code Style and Patterns

### Anchor comments

Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for.

### Guidelines:

- Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers.
- **Important:** Before scanning files, always first try to **grep for existing anchors** `AIDEV-*` in relevant subdirectories.
- **Update relevant anchors** when modifying associated code.
- **Do not remove `AIDEV-NOTE`s** without explicit human instruction.
- Make sure to add relevant anchor comments, whenever a file or piece of code is:
* too complex, or
* very important, or
* confusing, or
* could have a bug

## Domain Glossary

- **Trace**: A representation of a single request or transaction as it moves through a system. It's composed of one or more spans.
- **Span**: The basic building block of a trace. It represents a single unit of work (e.g., a function call, a database query). Spans have a start and end time, a name, a service, a resource, and can contain metadata (tags).
- **Name (`span.name`)**: A string that represents the name of the operation being measured. It should be a general, human-readable identifier for the type of operation, such as `http.request`, `sqlalchemy.query`, or a function name like `auth.login`. This is typically more general than the `resource` name.
- **Service**: A logical grouping of traces that perform the same function. For example, a web application or a database.
- **Resource**: The specific operation being performed within a service that a span represents. For example, a web endpoint (`/users/{id}`) or a database query (`SELECT * FROM users`).
- **Instrumentation**: The process of adding code to an application to generate traces. This can be done automatically by the library (`ddtrace-run`) or manually using the tracer API.
- **Integration**: A component of `dd-trace-py` that provides automatic instrumentation for a specific library or framework (e.g., Django, `requests`, `psycopg2`).
- **Tracer**: The object that creates and manages spans. It's the main entry point for manual instrumentation.
- **Profiler**: A component that collects data on CPU and memory usage at the code level, helping to identify performance bottlenecks.
- **`ddtrace-run`**: A command-line utility that automatically instruments a Python application by monkey-patching supported libraries at runtime.
- **Tag**: A key-value pair of metadata that can be added to a span to provide additional context for filtering, grouping, or analysis.

## What AI Must NEVER Do

1. **Never change public API contracts** - Breaks real applications
2. **Never commit secrets** - Use environment variables
3. **Never assume business logic** - Always ask
4. **Never remove AIDEV- comments** - They're there for a reason

Remember: We optimize for maintainability over cleverness.
When in doubt, choose the boring solution.
74 changes: 74 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# AI Guidelines dd-trace-py

## The Golden Rule
When unsure about implementation details, ALWAYS ask the developer.

## Project Context
`dd-trace-py` is the official Datadog library for tracing and monitoring Python applications. It provides developers with deep visibility into their application's performance by enabling distributed tracing, continuous profiling, and error tracking.

The library offers automatic instrumentation for a wide variety of popular Python frameworks and libraries, such as Django, Flask, Celery, and SQLAlchemy. This allows for quick setup and immediate insights with minimal code changes. For more specific use cases, `dd-trace-py` also provides a comprehensive manual instrumentation API.

Key features include:
- **Distributed Tracing:** Trace requests across service boundaries to understand the full lifecycle of a request in a microservices architecture.
- **Continuous Profiling:** Analyze code-level performance in production to identify and optimize CPU and memory-intensive operations.
- **Error Tracking:** Capture and aggregate unhandled exceptions and errors, linking them to specific traces for easier debugging.
- **Application Security Monitoring (ASM):** Detect and protect against threats and vulnerabilities within your application.

## Critical Architecture Decisions

### Automatic Instrumentation via Monkey-Patching
`dd-trace-py` heavily relies on monkey-patching for its automatic instrumentation. This allows the library to trace a wide range of standard libraries and frameworks without requiring any code changes from the user. This is a powerful feature that makes the library easy to adopt, but it's also a critical piece of architecture to be aware of when debugging or extending the tracer.

### Performance First
The tracer is designed to run in high-throughput production environments. This means performance is a primary concern. Key decisions to achieve this include:
1. **Core components written in C/Cython:** For performance-critical code paths.
2. **Low-overhead sampling:** To reduce the performance impact of tracing.
3. **Efficient serialization:** Using `msgpack` to communicate with the Datadog Agent.

### Extensible Integrations
The library is designed to be easily extensible with new "integrations" for libraries that are not yet supported. The integration system provides a clear pattern for adding new instrumentation.

### Configuration via Environment Variables
To make it easy to configure the tracer in various environments (local development, CI, production containers, etc.), the primary method of configuration is through environment variables.

## Code Style and Patterns

### Anchor comments

Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for.

### Guidelines:

- Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers.
- **Important:** Before scanning files, always first try to **grep for existing anchors** `AIDEV-*` in relevant subdirectories.
- **Update relevant anchors** when modifying associated code.
- **Do not remove `AIDEV-NOTE`s** without explicit human instruction.
- Make sure to add relevant anchor comments, whenever a file or piece of code is:
* too complex, or
* very important, or
* confusing, or
* could have a bug

## Domain Glossary

- **Trace**: A representation of a single request or transaction as it moves through a system. It's composed of one or more spans.
- **Span**: The basic building block of a trace. It represents a single unit of work (e.g., a function call, a database query). Spans have a start and end time, a name, a service, a resource, and can contain metadata (tags).
- **Name (`span.name`)**: A string that represents the name of the operation being measured. It should be a general, human-readable identifier for the type of operation, such as `http.request`, `sqlalchemy.query`, or a function name like `auth.login`. This is typically more general than the `resource` name.
- **Service**: A logical grouping of traces that perform the same function. For example, a web application or a database.
- **Resource**: The specific operation being performed within a service that a span represents. For example, a web endpoint (`/users/{id}`) or a database query (`SELECT * FROM users`).
- **Instrumentation**: The process of adding code to an application to generate traces. This can be done automatically by the library (`ddtrace-run`) or manually using the tracer API.
- **Integration**: A component of `dd-trace-py` that provides automatic instrumentation for a specific library or framework (e.g., Django, `requests`, `psycopg2`).
- **Tracer**: The object that creates and manages spans. It's the main entry point for manual instrumentation.
- **Profiler**: A component that collects data on CPU and memory usage at the code level, helping to identify performance bottlenecks.
- **`ddtrace-run`**: A command-line utility that automatically instruments a Python application by monkey-patching supported libraries at runtime.
- **Tag**: A key-value pair of metadata that can be added to a span to provide additional context for filtering, grouping, or analysis.

## What AI Must NEVER Do

1. **Never change public API contracts** - Breaks real applications
2. **Never commit secrets** - Use environment variables
3. **Never assume business logic** - Always ask
4. **Never remove AIDEV- comments** - They're there for a reason

Remember: We optimize for maintainability over cleverness.
When in doubt, choose the boring solution.
47 changes: 47 additions & 0 deletions scripts/generate_ai_docs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import argparse
import os
import re


def read_guidelines(filepath="AI_GUIDELINES.md"):
"""Reads the AI_GUIDELINES.md file and returns its content."""
with open(filepath, "r") as f:
return f.read()


def write_cursor_file(content):
"""Writes the content to the .cursor-mdc file."""
if not os.path.exists(".cursor"):
os.makedirs(".cursor")

with open(".cursor/dd-trace-py.mdc", "w") as f:
f.write(content)


def write_claude_file(content):
"""Writes the content to the CLAUDE.md file."""
with open("CLAUDE.md", "w") as f:
f.write(content)


def main():
"""Main function to generate AI docs."""
parser = argparse.ArgumentParser(description="Generate AI docs from AI_GUIDELINES.md")
parser.add_argument(
"--guidelines-file",
type=str,
default="AI_GUIDELINES.md",
help="Path to the AI_GUIDELINES.md file.",
)
args = parser.parse_args()

content = read_guidelines(args.guidelines_file)

write_cursor_file(content)
write_claude_file(content)

print("Successfully generated .cursor/dd-trace-py.mdc and CLAUDE.md")


if __name__ == "__main__":
main()
Loading