Context parallelism with MLA #1552

SuperCB · 2025-03-08T15:22:26Z

I have a question regarding FusedAttention: Why doesn't it support context parallelism with MLA (Multi-head Layer Attention)? What are the technical limitations preventing this compatibility?"

xrennvidia · 2025-03-10T21:33:51Z

Hi @SuperCB

You mean Multi-head Latent attention which is used by Deepseek? Technically, nothing should stop us from doing it, we just have not done it yet. Considering popularity of MLA/Deepseek, we should add this support for sure. We will do it. Thanks for bringing this to our attention.

SuperCB · 2025-03-11T02:22:46Z

I am working on it too. I found that the function AttnFuncWithCPAndQKVOA2A can support context parallelism for mla? Is my conclusion correct, and what are the main reasons currently preventing mla from supporting context parallelism?

xrennvidia · 2025-03-11T02:31:23Z

Yeah, A2A implementation probably can work with MLA out of the box. AttnFuncWithCPAndKVAllGather might work for MLA also.

P2P cannot work because it concats K and V into a single tensor for communication, different head_dim of K and V prevents us from doing the concat, but this should be addressable.

As I said, technically, there should be no reason preventing MLA+CP, at least I do not know the reasons now, I might find something after I start to work on this.

SuperCB · 2025-03-11T10:17:38Z

I think we can support MLA+CP in P2P by padding the v value, which ensures minimal modifications to the original code. I am currently attempting to use this method.

ptrendx assigned cyanguwa and xrennvidia Mar 10, 2025

SuperCB mentioned this issue Mar 12, 2025

Enable AttnFuncWithCPAndKVP2P to support mla #1561

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context parallelism with MLA #1552

Context parallelism with MLA #1552

SuperCB commented Mar 8, 2025

xrennvidia commented Mar 10, 2025 •

edited

Loading

SuperCB commented Mar 11, 2025

xrennvidia commented Mar 11, 2025

SuperCB commented Mar 11, 2025 •

edited

Loading

Context parallelism with MLA #1552

Context parallelism with MLA #1552

Comments

SuperCB commented Mar 8, 2025

xrennvidia commented Mar 10, 2025 • edited Loading

SuperCB commented Mar 11, 2025

xrennvidia commented Mar 11, 2025

SuperCB commented Mar 11, 2025 • edited Loading

xrennvidia commented Mar 10, 2025 •

edited

Loading

SuperCB commented Mar 11, 2025 •

edited

Loading