https://github.com/hyunwoongko/transformer/blob/0e5ce57589d7307cf76b53241cc523841ff67655/models/layers/scale_dot_product_attention.py#L35C5-L35C57
The dimensions of q, k, v entering the ScaledDotProductAttention module are [batch, head, seq_len, d_model], which results in the score dimension being [batch, head, seq_len, seq_len]. However, the mask does not have the head dimension and is only [batch, seq_len, seq_len]. Could this dimension mismatch in the referenced code cause an error?