Skip to content

Step 6 of Algorithm 1 code clarification #2

@johmedina

Description

@johmedina

The implementation applies a second torch.topk when computing m_i^(n), while Algorithm 1 in the paper defines m_i^(n) over all i_k which is the top-k from the final layer. Could you please clarify if this is intentional or an oversight?

layer_dot_results = F.cosine_similarity(candidate_gradients_expanded, layer_divergence_expanded, dim=2) layer_topk_values, layer_topk_indices = torch.topk(layer_dot_results, evolution_scale) layer_topk_topk_indices = topk_indices[layer_topk_indices]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions