Where to put optimizer.zero_grad() ? #1172

AloscariS · 2025-01-27T17:35:12Z

AloscariS
Jan 27, 2025

Hi everyone,
I followed all the tutorial. Thank you so much Daniel for this golden opportunity.

However I have a perplexity. During the tutorial the classic training pipeline was:

Forward pass
Calculate the Loss
Reset Gradients
Backward Pass
Step the Optimizer (parameters update)

But, chatting about it, chatGPT says:

You're absolutely right to question the order—let me clarify the correct training loop for neural networks. The **gradient reset** (`optimizer.zero_grad()`) happens **before** the forward pass, not after it. Here's the correct pipeline:

### Correct Training Pipeline:
1. **Gradient Reset (`optimizer.zero_grad()`)**:
   - Reset the gradients of all parameters **before** the forward and backward passes.
   - This ensures the gradients are computed fresh for the current batch, without being mixed with gradients from previous iterations.

2. **Forward Pass**:
   - Pass the input data through the network to compute predictions.
   - Compute the **loss** using the predictions and true labels.

3. **Backward Pass (`loss.backward()`)**:
   - Perform backpropagation to calculate gradients of the loss with respect to the model parameters.

4. **Parameters Update (`optimizer.step()`)**:
   - Use the optimizer to update the model's parameters based on the gradients computed in the backward pass.

### Why Not Reset Gradients After the Forward Pass?
- If you reset gradients **after** the forward pass, you'd erase the computed gradients from the previous step, defeating the purpose of backpropagation.

So the reset must always happen **before the forward pass** to ensure clean gradient calculations for the current batch.

- - -

That tutorial is **not wrong**, but it is a bit unconventional in the placement of `optimizer.zero_grad()`. Here's why:

- Technically, resetting gradients (`optimizer.zero_grad()`) can happen **before the backward pass**, and it will work as expected because the gradients are cleared **right before they are calculated**.
- However, placing `zero_grad()` at the start of the training loop (before the forward pass) is a **cleaner and more standard practice**. This way, you ensure that gradients are reset before any computation begins for the current batch.

Can you clarify this aspect please?

Thank you so much! 👍🏼

Answered by danli349

Jan 27, 2025

I think the gradients will keep unchanged during the Forward pass and Calculate the Loss, the gradients are only changed during the Backward Pass, so you only need to place optimizer.zero_grad() before the Backward Pass.

View full answer

37Lime · 2025-01-27T19:15:41Z

37Lime
Jan 27, 2025

Hi AloscariS,

Both approaches are fine to use. The important thing is to reset the gradients before you calculate the new ones.

I don't know why chatGPT considers the placement from the tutorial "unconventional". Perhaps, chatGPT was trained on more examples where optimizer.zero_grad() was placed first. For normal usage, it should just be a matter of taste.

Best wishes.

0 replies

danli349 · 2025-01-27T19:35:51Z

danli349
Jan 27, 2025

I think the gradients will keep unchanged during the Forward pass and Calculate the Loss, the gradients are only changed during the Backward Pass, so you only need to place optimizer.zero_grad() before the Backward Pass.

1 reply

AloscariS Jan 28, 2025
Author

I confirm that!
During the forward pass, gradients are not calculated or updated so what's important is to reset gradient between each backward pass.

Thak you! 👍🏼

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Where to put optimizer.zero_grad() ? #1172

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Where to put optimizer.zero_grad() ? #1172

Uh oh!

AloscariS Jan 27, 2025

Replies: 2 comments · 1 reply

Uh oh!

37Lime Jan 27, 2025

Uh oh!

danli349 Jan 27, 2025

Uh oh!

AloscariS Jan 28, 2025 Author

AloscariS
Jan 27, 2025

Replies: 2 comments 1 reply

37Lime
Jan 27, 2025

danli349
Jan 27, 2025

AloscariS Jan 28, 2025
Author