Skip to content

超长上下文依赖的 attention 计算 #3

@caijixueIT

Description

@caijixueIT

在 prefill 阶段调用 flash-attention v2 官方实现,计算如下 attention
QKV shape [batch, seq_len, num_heads, head_dim] = [1, 128*1024, 128, 128] 计算速度相当慢,有没有针对长上下文(>128K)注意力的优化思路,请教大佬

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions