Step 6 of Algorithm 1 code clarification

The implementation applies a second torch.topk when computing m_i^(n), while Algorithm 1 in the paper defines m_i^(n) over all i_k which is the top-k from the final layer.  Could you please clarify if this is intentional or an oversight?

`layer_dot_results = F.cosine_similarity(candidate_gradients_expanded, layer_divergence_expanded, dim=2)
                layer_topk_values, layer_topk_indices = torch.topk(layer_dot_results, evolution_scale)
                layer_topk_topk_indices = topk_indices[layer_topk_indices]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 6 of Algorithm 1 code clarification #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Step 6 of Algorithm 1 code clarification #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions