-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ops of AoT #70
Conversation
After creating the forward kernels, it outputs the following error. The backward kernels should be updated.
|
#define NUM_ACTIVATION_TENSORS 23 | ||
typedef struct { | ||
Tensor encoded; // (B, T, C) | ||
std::vector<Tensor> ln1; // (L, B, T, C) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, could we use std::array instead with NUM_ACTIVATION_TENSORS statically allocated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be confusing, but NUM_ACTIVATION_TENSORS is the number of variables like encoded
and ln1
, not the size of the vector.
typedef struct { | ||
Tensor wte; // (V, C) | ||
Tensor wpe; // (maxT, C) | ||
std::vector<Tensor> ln1w; // (L, C) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use std::array here instead of vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible if we decide on a model.
The size depends on the number of layers.
Since gpt2 has variations with different number of layers, the size cannot be determined at compile time.
//printf("inputs[0] = %d\n", inputs[0]); | ||
// encoder_forward(ctx, acts.encoded, inputs, params.wte, params.wpe, B, T, C); // encoding goes into residual[0] | ||
{ | ||
std::promise<void> promise; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to think about a good way to wrap the async state so we don't have to make it explicit every step, but that doesn't have to be addressed in this PR, can explore as a follow-up.
@@ -99,6 +99,10 @@ build/gpt2_webgpu: llm.c gpt2_124M.bin llm.c gpt2_webgpu.cpp ops.cpp | |||
mkdir -p build | |||
$(CC) $(CXXFLAGS) -Illm.c $(LDFLAGS) -o $@ gpt2_webgpu.cpp ops.cpp | |||
|
|||
build/gpt2_webgpu_aot: llm.c gpt2_124M.bin llm.c gpt2_webgpu_aot.cpp ops_aot.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually this will probably be the main gpt2_webgpu implementaiton, though this is fine for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - see comments below if we can make std::vector statically allocated std::arrays.
This is a great step! With the resource allocation overhead addressed AOT we can then iterate on optimizing kernel perf.
Besides that, we'll probably want to iterate a bit on some quality-of-life things like packaging ops to wrap promises/futures so they don't need to be written out everytime but for now this is a good start.
82935a6
to
f629a33
Compare
@austinvhuang Thank you for your review! |
@@ -47,7 +47,6 @@ typedef struct { | |||
|
|||
// the parameters of the model | |||
#define NUM_PARAMETER_TENSORS 16 | |||
#define NUM_PARAMETER_LAYERS 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following review, NUM_PARAMETER_LAYERS
has been replaced with num_layers
.
That makes sense, fine to leave it as vector for this PR, though in a future update may replace vector with a lighter dynamic allocation approach like unique_ptr. We can go ahead and merge to dev. Thanks! |
Thank you, too! |
This PR implements the forward pass of gpt-2 with AoT.
The backward pass has been disabled as it needs to be fixed to not use atomicAdd.
There was a memory error and it was in draft state for a while.
I added
-fsanitize=address
to make it possible to detect errors.The memory error has been fixed.