[Feat] Add PyTorch Profiler support for performance analysis by RagingSilence · Pull Request #193 · GeeeekExplorer/nano-vllm

RagingSilence · 2026-03-31T20:32:01Z

Motivation

Currently, nano-vllm lacks a built-in mechanism for performance analysis. Users have to manually insert profiler code into the source code to debug inference bottlenecks (CPU/GPU utilization, kernel execution time). This PR integrates PyTorch Profiler natively to provide a user-friendly profiling experience.

Modifications

ModelRunner: Added enable_profiling and profiling_output_dir arguments.
Profiler Logic: Implemented a non-intrusive profiler wrapper in ModelRunner.
- Uses a manual lifecycle management (start()/step()) to avoid Kineto state errors.
- Supports multi-GPU (Tensor Parallel) setups by appending rank to trace filenames.
- Uses a default schedule (wait=1, warmup=1, active=3, repeat=1) to automatically capture the initial steps of inference without generating oversized files.
Docs: Updated README.md with usage instructions.

Usage

Enable profiling during LLM initialization:

from nanovllm import LLM
llm = LLM(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    enable_profiling=True,
    profiling_output_dir="./profiler_logs"
)
# Run inference normally
# Profiler will automatically capture traces during the first few 
This pull request adds support for profiling tensor parallel inference with PyTorch's profiler, enabling detailed performance tracing across processes. Profiling can be enabled via a new flag, and traces are saved per process for analysis with TensorBoard or Perfetto. The implementation ensures profiling only captures inference (not initialization), and is integrated into both the `LLMEngine` and `ModelRunner` classes.

**Profiling support and documentation:**

* Added new `enable_profiling` and `profiling_output_dir` parameters to the `LLMEngine` and `ModelRunner` classes, allowing users to enable and configure PyTorch profiler output for each process. [[1]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-bd3e2cd29a4c19008001c7cec119f8c19f9a78901ebce5d0212037195a01982eL17-R17) [[2]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-bd3e2cd29a4c19008001c7cec119f8c19f9a78901ebce5d0212037195a01982eL26-R30) [[3]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396L17-R29) [[4]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396R46-R49)
* Implemented `_setup_profiler` method in `ModelRunner` to configure and start the PyTorch profiler, saving traces per rank for later analysis. Profiler is started lazily on first inference to avoid capturing initialization overhead. [[1]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396R59-R96) [[2]](https://github.com/GeeeekExplorer/nano-vllm/pull/193/files#diff-b952226638660ed1016755ade81f50ce6b99c7484815105090b654cf4a62d396R255-R278)
* Updated the `exit` method in `ModelRunner` to ensure the profiler is properly stopped when the process exits.
* Added a new "Profiling" section to the `README.md` with usage instructions and analysis tips for the generated traces.

RagingSilence added 2 commits April 1, 2026 04:17

Feat: Add PyTorch Profiler support for performance analysis

b828847

Fix README

66b9e04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add PyTorch Profiler support for performance analysis#193

[Feat] Add PyTorch Profiler support for performance analysis#193
RagingSilence wants to merge 2 commits into
GeeeekExplorer:mainfrom
RagingSilence:feat/add-profiler

RagingSilence commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RagingSilence commented Mar 31, 2026

Motivation

Modifications

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant