Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/recommended_models_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,26 @@ These tables show the models currently tested for accuracy and performance.
This table shows the features currently tested for accuracy and performance.

{{ read_csv('../support_matrices/feature_support_matrix.csv', keep_default_na=False) }}

## Kernel Support

This table shows the current kernel support status.

{{ read_csv('../support_matrices/kernel_support_matrix.csv', keep_default_na=False) }}

## Parallelism Support

This table shows the current parallelism support status.

{{ read_csv('../support_matrices/parallelism_support_matrix.csv', keep_default_na=False) }}

## Quantization Support

This table shows the current quantization support status.

{{ read_csv('../support_matrices/quantization_support_matrix.csv', keep_default_na=False) }}

!!! info "Legend"
* βœ… Supported
* 🚧 Coming Soon
* ❌ Not Supported
19 changes: 12 additions & 7 deletions support_matrices/feature_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
Feature,CorrectnessTest,PerformanceTest
"Collective Communication Matmul",βœ…,N/A
"Prefix Caching",βœ…,βœ…
"Multimodal Inputs",βœ…,βœ…
"Quantized Matmul Attention and KV Cache",βœ…,βœ…
"Chunked Prefill",βœ…,βœ…
"JAX-Path Qxix Quantization",βœ…,βœ…
"DCN-based P/D disaggregation",🚧,🚧
"KV cache host offloading",🚧,🚧
"Llama 4 Maverick",🚧,🚧
"LoRA_Torch",βœ…,🚧
"Multimodal Inputs",βœ…,βœ…
"Out-of-tree model support",βœ…,βœ…
"Prefix Caching",βœ…,βœ…
"Single Program Multi Data",βœ…,βœ…
"Speculative Decoding: Eagle3",βœ…,βœ…
"Speculative Decoding: Ngram",βœ…,βœ…
"Structured Decoding",βœ…,N/A
"Ragged Paged Attention V3",βœ…,βœ…
"async scheduler",βœ…,βœ…
"runai_model_streamer_loader",βœ…,N/A
"sampling_params",βœ…,N/A
"structured_decoding",βœ…,N/A
8 changes: 8 additions & 0 deletions support_matrices/kernel_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Feature,CorrectnessTest,PerformanceTest
"Collective Communication Matmul",βœ…,🚧
"MLA",🚧,🚧
"MoE",🚧,🚧
"Quantized Attention",🚧,🚧
"Quantized KV Cache",🚧,🚧
"Quantized Matmul",🚧,🚧
"Ragged Paged Attention V3",βœ…,βœ…
6 changes: 3 additions & 3 deletions support_matrices/nightly/feature_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Feature,CorrectnessTest,PerformanceTest
"Chunked Prefill",βœ…,βœ…
"DCN-based P/D disaggregation",to be added,to be added
"KV cache host offloading",to be added,to be added
"Llama 4 Maverick",to be added,to be added
"DCN-based P/D disaggregation",🚧,🚧
"KV cache host offloading",🚧,🚧
"Llama 4 Maverick",🚧,🚧
"LoRA_Torch",❌,N/A
"Multimodal Inputs",βœ…,❌
"Out-of-tree model support",βœ…,❌
Expand Down
12 changes: 6 additions & 6 deletions support_matrices/nightly/kernel_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Feature,CorrectnessTest,PerformanceTest
"Collective Communication Matmul",βœ…,to be added
"MLA",to be added,to be added
"MoE",to be added,to be added
"Quantized Attention",to be added,to be added
"Quantized KV Cache",to be added,to be added
"Quantized Matmul",to be added,to be added
"Collective Communication Matmul",βœ…,🚧
"MLA",🚧,🚧
"MoE",🚧,🚧
"Quantized Attention",🚧,🚧
"Quantized KV Cache",🚧,🚧
"Quantized Matmul",🚧,🚧
"Ragged Paged Attention V3",βœ…,βœ…
10 changes: 5 additions & 5 deletions support_matrices/nightly/parallelism_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Feature,CorrectnessTest,PerformanceTest
"CP",to be added,to be added
"DP",❌,N/A
"EP",to be added,to be added
"CP",🚧,🚧
"DP",❌,🚧
"EP",🚧,🚧
"PP",βœ…,βœ…
"SP",to be added,to be added
"TP",to be added,to be added
"SP",🚧,🚧
"TP",🚧,🚧
12 changes: 6 additions & 6 deletions support_matrices/nightly/quantization_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Feature,Recommended TPU Generations,CorrectnessTest,PerformanceTest
"AWQ INT4","v5, v6",to be added,to be added
"FP4 W4A16",v7,to be added,to be added
"FP8 W8A8",v7,to be added,to be added
"FP8 W8A16",v7,to be added,to be added
"INT4 W4A16","v5, v6",to be added,to be added
"INT8 W8A8","v5, v6",to be added,to be added
"AWQ INT4","v5, v6",🚧,🚧
"FP4 W4A16",v7,🚧,🚧
"FP8 W8A8",v7,🚧,🚧
"FP8 W8A16",v7,🚧,🚧
"INT4 W4A16","v5, v6",🚧,🚧
"INT8 W8A8","v5, v6",🚧,🚧
7 changes: 7 additions & 0 deletions support_matrices/parallelism_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Feature,CorrectnessTest,PerformanceTest
"CP",🚧,🚧
"DP",❌,N/A
"EP",🚧,🚧
"PP",βœ…,βœ…
"SP",🚧,🚧
"TP",🚧,🚧
7 changes: 7 additions & 0 deletions support_matrices/quantization_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Feature,Recommended TPU Generations,CorrectnessTest,PerformanceTest
"AWQ INT4","v5, v6",🚧,🚧
"FP4 W4A16",v7,🚧,🚧
"FP8 W8A8",v7,🚧,🚧
"FP8 W8A16",v7,🚧,🚧
"INT4 W4A16","v5, v6",🚧,🚧
"INT8 W8A8","v5, v6",🚧,🚧
5 changes: 3 additions & 2 deletions support_matrices/text_only_model_support_matrix.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
Model,UnitTest,IntegrationTest,Benchmark
"meta-llama/Llama-3.3-70B-Instruct",βœ…,βœ…,βœ…
"Qwen/Qwen3-32B",βœ…,βœ…,βœ…
"Qwen/Qwen3-4B",βœ…,βœ…,βœ…
"google/gemma-3-27b-it",βœ…,βœ…,βœ…
"Qwen/Qwen3-32B",βœ…,βœ…,βœ…
"meta-llama/Llama-Guard-4-12B",βœ…,βœ…,βœ…
"meta-llama/Llama-3.1-8B-Instruct",βœ…,βœ…,βœ…
"Qwen/Qwen3-30B-A3B",βœ…,βœ…,βœ…
"Qwen/Qwen3-4B",βœ…,βœ…,βœ