Why we calculate the loss between logits and target in CrossEntropyLoss? #166

ahmad-alismail · 2022-11-08T14:01:27Z

ahmad-alismail
Nov 8, 2022

Hello,
In the tutorial 02_pytorch_classification I've learned that we should calculate the loss. i.e., The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are (loss = loss_fn(y_pred, y_train))

Since BCEWithLogitsLoss() has a sigmoid layer, we calculate the loss between logits (raw outputs) and y_train (labels).

Regarding to nn.CrossEntropyLoss(), it calculates the loss between input logits and target. Does nn.CrossEntropyLoss() have a softmax layer?
Is the loss always calculated based on the comparison between logits (raw outputs) and labels?

Thank you in advance!

Answered by mrdbourke

Nov 9, 2022

Hi @afi1289,

Good questions!

Regarding to nn.CrossEntropyLoss(), it calculates the loss between input logits and target. Does nn.CrossEntropyLoss() have a softmax layer?

Yes, you're right!

This is in the documentation for torch.nn.CrossEntropyLoss() with the line:

"Note that this case is equivalent to the combination of LogSoftmax and NLLLoss."

Is the loss always calculated based on the comparison between logits (raw outputs) and labels?

In the case of how the PyTorch functions are implemented, generally, yes.

This is because if you look under the hood of the output layers/how the loss formulas are implemented, the outputs are in the form of a "logit" rather than a "label" (how the gr…

View full answer

mrdbourke · 2022-11-09T23:36:32Z

mrdbourke
Nov 9, 2022
Maintainer

Hi @afi1289,

Good questions!

Regarding to nn.CrossEntropyLoss(), it calculates the loss between input logits and target. Does nn.CrossEntropyLoss() have a softmax layer?

Yes, you're right!

This is in the documentation for torch.nn.CrossEntropyLoss() with the line:

"Note that this case is equivalent to the combination of LogSoftmax and NLLLoss."

Is the loss always calculated based on the comparison between logits (raw outputs) and labels?

In the case of how the PyTorch functions are implemented, generally, yes.

This is because if you look under the hood of the output layers/how the loss formulas are implemented, the outputs are in the form of a "logit" rather than a "label" (how the ground truth labels are formatted).

So there needs to be some conversion between the logit and the label so they can be properly compared.

You can see more of the formulas behind each loss in this article: https://blog.paperspace.com/pytorch-loss-functions/

1 reply

ahmad-alismail Nov 10, 2022
Author

Thanks a lot for your answer! Your course/notebooks are the best deep learning material I've ever seen!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why we calculate the loss between logits and target in CrossEntropyLoss? #166

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why we calculate the loss between logits and target in CrossEntropyLoss? #166

Uh oh!

ahmad-alismail Nov 8, 2022

Replies: 1 comment · 1 reply

Uh oh!

mrdbourke Nov 9, 2022 Maintainer

Uh oh!

ahmad-alismail Nov 10, 2022 Author

ahmad-alismail
Nov 8, 2022

Replies: 1 comment 1 reply

mrdbourke
Nov 9, 2022
Maintainer

ahmad-alismail Nov 10, 2022
Author