add gradient accumulation support by jonscheunemann · Pull Request #21 · zincware/perspic

jonscheunemann · 2026-04-14T11:30:20Z

Introduced micro_batch_size and effective_batch_size parameters to the analyzer function.
Implemented gradient accumulation logic, allowing for optimization steps to occur after a specified number of micro-batches.
Added validation for batch size parameters to ensure compatibility.
Updated tests to cover new functionality, including parameter validation, optimizer behavior, and logging behavior during accumulation cycles.

- Introduced `micro_batch_size` and `effective_batch_size` parameters to the analyzer function. - Implemented gradient accumulation logic, allowing for optimization steps to occur after a specified number of micro-batches. - Added validation for batch size parameters to ensure compatibility. - Updated tests to cover new functionality, including parameter validation, optimizer behavior, and logging behavior during accumulation cycles.

Copilot

Pull request overview

Adds gradient accumulation support to perspic.analyzer.analyzer() so optimization steps and analysis/logging can be aligned to an “effective batch size” composed of multiple micro-batches.

Changes:

Introduces micro_batch_size / effective_batch_size parameters, validation, and an internal accumulation_steps computation.
Implements optimizer stepping every accumulation_steps micro-batches and switches scheduling/logging to an effective_step counter.
Adds extensive unit tests for validation, optimizer behavior, scheduling, and metric/logging behavior under accumulation.

Reviewed changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`perspic/analyzer.py`	Implements gradient accumulation, effective-step scheduling, accumulated metric buffers, and effective batch size logging.
`tests/unit/test_analyzer.py`	Adds a large new test suite for accumulation behavior and updates existing tests to use the new effective-step counter.
`.gitignore`	Expands ignored log/checkpoint/dataset patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T15:09:48Z

+            self.model.zero_grad()
+            loss = self.criterion(self.model(x), y)
+            loss.backward()
+
+            loss_val = loss.detach().item()
+            if is_train:
+                self._accum_train_loss += loss_val
+                if self._accum_grad_train is None:
+                    self._accum_grad_train = [
+                        p.grad.clone() if p.grad is not None else None
+                        for p in self.model.parameters()
+                    ]
+                else:
+                    for acc, p in zip(
+                        self._accum_grad_train, self.model.parameters()
+                    ):
+                        if acc is not None and p.grad is not None:
+                            acc.add_(p.grad)
+            else:
+                self._accum_measure_loss += loss_val
+                if self._accum_grad_measure is None:
+                    self._accum_grad_measure = [
+                        p.grad.clone() if p.grad is not None else None
+                        for p in self.model.parameters()
+                    ]
+                else:
+                    for acc, p in zip(
+                        self._accum_grad_measure, self.model.parameters()
+                    ):
+                        if acc is not None and p.grad is not None:
+                            acc.add_(p.grad)
+
+            self.model.zero_grad()
+


With gradient accumulation enabled, _before_training_step() runs before manual_backward() and mid-cycle training gradients are expected to remain on parameters. However, the accumulated-analysis path calls self.sample_calc.compute(...) (which internally calls model.zero_grad()) and _accumulate_linearizer_grads() also does self.model.zero_grad() + loss.backward() + zero_grad() again. This will wipe the already-accumulated training gradients from previous micro-batches, breaking gradient accumulation when the analyzer is enabled. Consider preserving/restoring parameter .grad around analysis, or refactoring calculators to use torch.autograd.grad (no writes to .grad), or running analysis on a separate model copy so training gradients aren’t mutated.

Suggested change

self.model.zero_grad()

loss = self.criterion(self.model(x), y)

loss.backward()

loss_val = loss.detach().item()

if is_train:

self._accum_train_loss += loss_val

if self._accum_grad_train is None:

self._accum_grad_train = [

p.grad.clone() if p.grad is not None else None

for p in self.model.parameters()

]

else:

for acc, p in zip(

self._accum_grad_train, self.model.parameters()

):

if acc is not None and p.grad is not None:

acc.add_(p.grad)

else:

self._accum_measure_loss += loss_val

if self._accum_grad_measure is None:

self._accum_grad_measure = [

p.grad.clone() if p.grad is not None else None

for p in self.model.parameters()

]

else:

for acc, p in zip(

self._accum_grad_measure, self.model.parameters()

):

if acc is not None and p.grad is not None:

acc.add_(p.grad)

self.model.zero_grad()

loss = self.criterion(self.model(x), y)

params = list(self.model.parameters())

grad_params = [p for p in params if p.requires_grad]

grad_values = torch.autograd.grad(

loss,

grad_params,

allow_unused=True,

)

grads = []

grad_iter = iter(grad_values)

for p in params:

if p.requires_grad:

grads.append(next(grad_iter))

else:

grads.append(None)

loss_val = loss.detach().item()

if is_train:

self._accum_train_loss += loss_val

if self._accum_grad_train is None:

self._accum_grad_train = [

g.clone() if g is not None else None for g in grads

]

else:

for acc, g in zip(self._accum_grad_train, grads):

if acc is not None and g is not None:

acc.add_(g)

else:

self._accum_measure_loss += loss_val

if self._accum_grad_measure is None:

self._accum_grad_measure = [

g.clone() if g is not None else None for g in grads

]

else:

for acc, g in zip(self._accum_grad_measure, grads):

if acc is not None and g is not None:

acc.add_(g)

Copilot · 2026-04-14T15:09:48Z

+            # Gradient accumulation setup
+            self.micro_batch_size = micro_batch_size
+            self.effective_batch_size = effective_batch_size
+
+            if (
+                effective_batch_size is not None
+                and micro_batch_size is None
+            ):
+                raise ValueError(
+                    "micro_batch_size must be specified when "
+                    "effective_batch_size is set."
+                )
+
+            if (
+                micro_batch_size is not None
+                and effective_batch_size is not None
+            ):
+                if effective_batch_size < micro_batch_size:
+                    raise ValueError(
+                        f"effective_batch_size "
+                        f"({effective_batch_size}) must be "
+                        f">= micro_batch_size ({micro_batch_size})."
+                    )
+                if effective_batch_size % micro_batch_size != 0:
+                    raise ValueError(
+                        f"effective_batch_size "
+                        f"({effective_batch_size}) must be "
+                        f"divisible by micro_batch_size "
+                        f"({micro_batch_size})."
+                    )
+                self.accumulation_steps = (
+                    effective_batch_size // micro_batch_size
+                )
+            else:
+                self.accumulation_steps = 1


micro_batch_size/effective_batch_size are only validated for presence and divisibility, but not for being positive integers. Passing 0 (or negative values) will lead to ZeroDivisionError/nonsensical accumulation_steps. Add explicit validation that any provided batch-size parameter is an int and > 0 before using it in comparisons/modulus/division.

Copilot · 2026-04-14T15:09:49Z

+                # TODO: How would a measure batchsize different to the effective batch size work here?
+                # !!! We would need to accumulate separately and then combine at the end.
+                if x2 is not None and y2 is not None:
+                    self._accumulate_linearizer_grads(
+                        x2, y2, is_train=False
+                    )


The accumulated-analysis implementation assumes the measure batch (cross-response) can be accumulated with the same accumulation_steps/batch sizing as the train batch, but the new TODO comment indicates this case is not actually handled. To avoid silently incorrect cross metrics, either (a) enforce/validate that train & measure micro-batch sizes match and document the requirement, or (b) implement separate accumulation logic for the measure side as suggested by the comment.

KonstiNik · 2026-04-14T18:24:05Z

Looks nice! I have questions about the general implementation here:

Lightning offers a native batch-accumulation implementation for training, see trainer = Trainer(accumulate_grad_batches=7). What if I would select that? How does it interact with your implementation?
Why can't we build our implementation on top of the built-in one?

KonstiNik requested a review from Copilot April 14, 2026 15:04

Copilot started reviewing on behalf of KonstiNik April 14, 2026 15:05 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gradient accumulation support#21

add gradient accumulation support#21
jonscheunemann wants to merge 1 commit into
mainfrom
jonny_batch_accumulation

jonscheunemann commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

KonstiNik commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jonscheunemann commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

KonstiNik commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants