Skip to content

Conversation

heiaipika
Copy link
Collaborator

@heiaipika heiaipika commented Sep 4, 2025

Resolves #110

prompt:
Role: You are an expert technical writer and JAX/TPU optimization engineer specializing in high-performance inference systems for large language models. Your task is to analyze the Flash Attention Kernel implementation in SGLang-JAX and generate comprehensive developer-oriented feature documentation in Markdown format.
Generate a Flash Attention Kernel feature document named flash_attention_kernel_genbyclaude.md for the docs/features directory, following the exact format and style of existing feature documentation in this project.
You will be provided with:
The Project: You have full access to the entire codebase. You can read, analyze, and reference any file within it. This is your primary source for generating content.
FIRST: You MUST perform a top-down analysis of the codebase to identify its core functional domains, logical layers, or key service boundaries. These domains will form the architectural groupings for your diagrams and content organization. And then, analyze and Create the Documentation Structure.
The documentation should include these main sections (Goals, Design, Implementation, Usage) but adapt based on repository content

  1. Goals

  2. Design (adapt based on repository content):
    Core Concept
    ● High-level solution overview
    ● Problem-solution alignment
    ● Design philosophy
    Architecture
    ● Analyze data dependencies and interaction patterns from source code
    ● Identify key integration points and convergence components
    Key Components

  3. Implementation :
    ● Cover key implementation aspects from developer viewpoint
    ● Must Include relevant code snippets (avoid pseudocode)
    ● Choose appropriate logic flow:
    ○ Complex systems: Use execution flow logic
    ○ Simple features: Use component-based logic
    ● Show actual working code examples
    ● Highlight critical integration points

  4. Usage (Brief command-line configuration with essential parameters and use cases)
    Every parts that would benefit from visual diagrams, including:
    ● Architecture overviews
    ● Data flow descriptions
    ● Component relationships
    ● Process workflows
    ● State machines
    ● Class hierarchies
    SECOND: Generate Detailed Content for Each Part
    For each part in the structure, you MUST:

  5. Start with Source File Listing
    ○ Begin IMMEDIATELY with a

    block listing ALL relevant source files used
    ○ Use AT LEAST 5 different source files for comprehensive coverage
    ○ Format exactly as specified with proper Markdown links

  6. Introduction (1-2 paragraphs)
    ○ Explain the purpose, scope, and high-level overview of the part topic
    ○ Contextualize within the overall project
    ○ Reference other documentation if information is available in source files
    ○ Base content SOLELY on provided source files

  7. Detailed Sections
    ○ Break down the topic into logical sections using H2 (##) and H3 (###) headings
    ○ For each section:

     ■ Explain architecture, components, data flow, or logic as evidenced in source files
     ■ Identify key functions, classes, data structures, API endpoints, or configuration elements
    

    ○ Structure document logically for easy understanding

  8. Mermaid Diagrams
    ○ Use flowchart TD, sequenceDiagram, classDiagram, erDiagram, graph TD
    ○ Ensure diagrams are accurate and directly derived from source files
    ○ Provide brief context explanations before/after each diagram
    ○ CRITICAL FORMATTING RULES:
    ■ Strict vertical orientation only: Use "graph TD" (never "graph LR")
    ■ Maximum node width: 3-4 words
    ■ For sequence diagrams:

         ■ All labels and subgraph names containing parentheses, brackets, or special characters MUST be wrapped in double quotes: ["Label (with parens)"] or subgraph "Subgraph (Name)"
         ■ Start with "sequenceDiagram" directive on its own line
         ■ Define ALL participants at the beginning
         ■ Use concise participant names
         ■ Correct arrow types: ->> (request/async), -->> (response), -x (failed)
         ■ Include activation boxes using +/- notation
         ■ Add notes using "Note over" or "Note right of"
         ■ Diagram generation priority: Accuracy > Complexity
    
  9. Tables
    ○ Use Markdown tables to summarize:

     ■ Key features/components and descriptions
     ■ API endpoint parameters, types, and descriptions
     ■ Configuration options, types, and default values
     ■ Data model fields, types, constraints, and descriptions
    
  10. Code Snippets (Optional)
    ○ Include short, relevant code snippets from source files
    ○ Use proper language identifiers in code blocks
    ○ Illustrate key implementation details, data structures, or configurations

  11. Source Citations (EXTREMELY IMPORTANT)
    ○ For EVERY significant piece of information, diagram, table entry, or code snippet:

     ■ Cite specific source file(s) and line numbers
     ■ Use format: Sources: [filename.ext:start_line-end_line]() or Sources: [filename.ext:line_number]()
     ■ For broad relevance: Sources: [dir/file3.ext]()
    

    ○ Place citations at end of paragraphs, under diagrams/tables, or after code snippets
    ○ Cite under section headings if overwhelmingly based on 1-2 files
    ○ MUST cite AT LEAST 5 different source files throughout the guide

  12. Technical Accuracy
    ○ Derive ALL information SOLELY from provided source files
    ○ Do not infer, invent, or use external knowledge
    ○ If information is missing from source files, either omit it or explicitly state its absence

  13. Writing Style
    ○ Use clear, professional, concise technical language
    ○ Avoid unnecessary jargon but use correct technical terms
    ○ Suitable for developers learning about the project

  14. Conclusion/Summary
    ○ End with brief summary paragraph reiterating key aspects
    ○ Explain significance within the project context
    CRITICAL REQUIREMENTS:
    ● Start every part with the

    block listing source files
    ● Use vertical-oriented Mermaid diagrams exclusively
    ● avoid pseudocode
    ● In architecture overview diagrams, use subgraph clusters to represent high-level components to visually group related modules is recommended. (The subgraph clusters MUST represent the key functional domains, logical layers, or service boundaries identified in your initial analysis.
    ● Cite sources for every significant element using specified format
    ● Maintain absolute technical accuracy based solely on source files
    ● Ensure comprehensive coverage using at least 5 different source files
    Generate a comprehensive docs structure first, then provide detailed content for each identified part based on thorough analysis of the relevant source files, adhering strictly to all formatting and citation requirements.

@sii-xinglong
Copy link
Collaborator

It seems like there are a lot of hallucinations, and there is no structure that the document wants to express.
Can we write it according to the same logic for different people, such as developers or users?

@heiaipika
Copy link
Collaborator Author

It seems like there are a lot of hallucinations, and there is no structure that the document wants to express. Can we write it according to the same logic for different people, such as developers or users?
Got it! Currently, only the "Implementation" section of the prompt it's for developers; I'll try defining it globally.

@heiaipika heiaipika force-pushed the docs/flash_attention_kernel branch 2 times, most recently from 3d88bf6 to b0d98db Compare September 12, 2025 09:36
@heiaipika heiaipika force-pushed the docs/flash_attention_kernel branch from b0d98db to 4de98e0 Compare September 17, 2025 10:54
@heiaipika heiaipika closed this Sep 22, 2025
@heiaipika heiaipika deleted the docs/flash_attention_kernel branch September 22, 2025 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Doc] more detail about flash attention kernel
2 participants