[Chapter 8 - video #258] Expand position embedding to match the batch size #259
-
Hi @mrdbourke, I'm going through the Namely, I'd propose the following where we do call def forward(self, x: torch.Tensor) -> torch.Tensor:
batch_size = x.shape[0]
# expand class token embedding to match the batch size
class_token = self.class_embedding.expand(batch_size, -1, -1)
x = self.patch_embedding(x)
x = torch.cat(tensors=(class_token, x), dim=1)
x = self.position_embedding.expand(batch_size, -1, -1) + x # no expand?
#print(x.shape)
x = self.embedding_dropout(x)
x = self.transformer_encoder(x)
x = self.classifier_head(x[:, 0, :])
return x If this is not the case, could you please shed light on where I'm wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @AlessandroMiola! Great suggestion! And you'd be right thinking that, however, due to the nature of addition in PyTorch, the This is from equation 1 in the paper: https://www.learnpytorch.io/08_pytorch_paper_replicating/#47-creating-the-position-embedding See an example on Google Colab here: https://www.learnpytorch.io/08_pytorch_paper_replicating/#47-creating-the-position-embedding Let's see an example of creating a batched image tensor of all zeroes then add all ones to it: import torch
from torch import nn
# Set hyperparameters
batch_size = 32
embed_dim = 768
num_patches = 196
# Create batch of zeros
x = torch.zeros([batch_size,
num_patches+1, # +1 is for learnable class_token (not shown here but see: https://www.learnpytorch.io/08_pytorch_paper_replicating/#46-creating-the-class-token-embedding)
embed_dim])
print(f"Patch embedding with class token shape: {x.shape} -> [batch_size, patch_embedding + class_token, embedding_dim]")
print(x[0]) Out:
Create the position embedding as a learnable tensor: # Create the position embedding, see here: https://www.learnpytorch.io/08_pytorch_paper_replicating/#47-creating-the-position-embedding
position_embedding = nn.Parameter(torch.ones(1,
num_patches+1, # +1 is for class_token
embed_dim),
requires_grad=True) # make sure it's learnable
# Show the first 10 sequences and 10 position embedding values and check the shape of the position embedding
print(position_embedding[:, :10, :10])
print(f"Position embeddding shape: {position_embedding.shape} -> [batch_size, number_of_patches, embedding_dimension]") Out:
Add the position embedding (ones) to the patched/batched image embedding (all zeroes), across all samples in a batch: # Add position embedding to x (due to nature of addition in PyTorch, it adds to all tensors in batch)
x_with_position_embedding = x + position_embedding
x_with_position_embedding Out:
Check if position embedding was added to all samples in batch (this is known as broadcasting in NumPy/PyTorch): # Check to see if position embedding added to "x"
x_with_position_embedding == 1 Out:
A fair bit going on here but I hope that clears things up! Let me know if you're unsure of anything! |
Beta Was this translation helpful? Give feedback.
-
Shouldn't the position embeddings be a combination of sin and cos like it is in the attention is all you need paper? |
Beta Was this translation helpful? Give feedback.
Hi @AlessandroMiola!
Great suggestion! And you'd be right thinking that, however, due to the nature of addition in PyTorch, the
x = self.position_embedding + x
will add theposition_embedding
across every sample in the batch.This is from equation 1 in the paper: https://www.learnpytorch.io/08_pytorch_paper_replicating/#47-creating-the-position-embedding
See an example on Google Colab here: https://www.learnpytorch.io/08_pytorch_paper_replicating/#47-creating-the-position-embedding
Let's see an example of creating a batched image tensor of all zeroes then add all ones to it: