Training classification models on Test set #1102

WissamElJ · 2024-10-02T16:34:23Z

WissamElJ
Oct 2, 2024

Hi!
First of all, thank you so much for the detailed curriculum that you guys made for PyTorch!

I have a question regarding the .train() stage and .eval() stage.
In the training stage, we are outputting the predictions on the test_dataloader.
Then in inference mode, we are outputting also the predictions on the test_dataloader.
Doesn't that give us the same result at the end?

Did you skip the validation set for simplicity (which in this case, replaces the test data in the training stage)

Answered by LuluW8071

Oct 2, 2024

In training stage, preds for train dataloader is outputted while on inference stage its prediction on test dataloader .
Its not same.

During model.train(), all the weights at each layers including dropout, batchnorm, layernorm are initialized and gives raw logits on train dataloader which is then transformed into probabilities and hence we get predictions

During model.eval(), weights of train mode are saved but dropout and normalization layers are excluded(turned off), and then generate predictions

View full answer

LuluW8071 · 2024-10-02T19:30:54Z

LuluW8071
Oct 2, 2024

In training stage, preds for train dataloader is outputted while on inference stage its prediction on test dataloader .
Its not same.

During model.train(), all the weights at each layers including dropout, batchnorm, layernorm are initialized and gives raw logits on train dataloader which is then transformed into probabilities and hence we get predictions

During model.eval(), weights of train mode are saved but dropout and normalization layers are excluded(turned off), and then generate predictions

3 replies

WissamElJ Oct 3, 2024
Author

Are the dropout, batchnorm and layernorm have a standard placing in the neural network? I can see that in the PyTorch docs that we can initialize them.
Should we pay attention to them as we are building more complex neural networks? or stick to the way Pytorch uses them?

LuluW8071 Oct 3, 2024

Yes, these normalization layers have their own standard placing and advantages as to

capture features faster by covariance shifting,
add stability,
deals with possibilities of overfit/underfit,
dropout features, and many more.

These are still used in SOTA models. Even LLMs have their own normalization called RMSNorm.

You can read batch normalization paper here for starters.

WissamElJ Oct 3, 2024
Author

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training classification models on Test set #1102

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Training classification models on Test set #1102

Uh oh!

WissamElJ Oct 2, 2024

Replies: 1 comment · 3 replies

Uh oh!

LuluW8071 Oct 2, 2024

Uh oh!

WissamElJ Oct 3, 2024 Author

Uh oh!

Uh oh!

LuluW8071 Oct 3, 2024

Uh oh!

WissamElJ Oct 3, 2024 Author

WissamElJ
Oct 2, 2024

Replies: 1 comment 3 replies

LuluW8071
Oct 2, 2024

WissamElJ Oct 3, 2024
Author

WissamElJ Oct 3, 2024
Author