Skip to content

[Feat] Add PyTorch Profiler support for performance analysis#193

Open
RagingSilence wants to merge 2 commits into
GeeeekExplorer:mainfrom
RagingSilence:feat/add-profiler
Open

[Feat] Add PyTorch Profiler support for performance analysis#193
RagingSilence wants to merge 2 commits into
GeeeekExplorer:mainfrom
RagingSilence:feat/add-profiler

Conversation

@RagingSilence

Copy link
Copy Markdown

Motivation

Currently, nano-vllm lacks a built-in mechanism for performance analysis. Users have to manually insert profiler code into the source code to debug inference bottlenecks (CPU/GPU utilization, kernel execution time). This PR integrates PyTorch Profiler natively to provide a user-friendly profiling experience.

Modifications

  1. ModelRunner: Added enable_profiling and profiling_output_dir arguments.
  2. Profiler Logic: Implemented a non-intrusive profiler wrapper in ModelRunner.
    • Uses a manual lifecycle management (start()/step()) to avoid Kineto state errors.
    • Supports multi-GPU (Tensor Parallel) setups by appending rank to trace filenames.
    • Uses a default schedule (wait=1, warmup=1, active=3, repeat=1) to automatically capture the initial steps of inference without generating oversized files.
  3. Docs: Updated README.md with usage instructions.

Usage

Enable profiling during LLM initialization:

from nanovllm import LLM
llm = LLM(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    enable_profiling=True,
    profiling_output_dir="./profiler_logs"
)
# Run inference normally
# Profiler will automatically capture traces during the first few 
This pull request adds support for profiling tensor parallel inference with PyTorch's profiler, enabling detailed performance tracing across processes. Profiling can be enabled via a new flag, and traces are saved per process for analysis with TensorBoard or Perfetto. The implementation ensures profiling only captures inference (not initialization), and is integrated into both the `LLMEngine` and `ModelRunner` classes.

**Profiling support and documentation:**

* Added new `enable_profiling` and `profiling_output_dir` parameters to the `LLMEngine` and `ModelRunner` classes, allowing users to enable and configure PyTorch profiler output for each process. [[1]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-bd3e2cd29a4c19008001c7cec119f8c19f9a78901ebce5d0212037195a01982eL17-R17) [[2]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-bd3e2cd29a4c19008001c7cec119f8c19f9a78901ebce5d0212037195a01982eL26-R30) [[3]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396L17-R29) [[4]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396R46-R49)
* Implemented `_setup_profiler` method in `ModelRunner` to configure and start the PyTorch profiler, saving traces per rank for later analysis. Profiler is started lazily on first inference to avoid capturing initialization overhead. [[1]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396R59-R96) [[2]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396R255-R278)
* Updated the `exit` method in `ModelRunner` to ensure the profiler is properly stopped when the process exits.
* Added a new "Profiling" section to the `README.md` with usage instructions and analysis tips for the generated traces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant