encoding targets into latent space #961

kctezcan · 2025-09-25T15:15:40Z

Description

Encoding target variables into the latent space, similar to sources.

The changes are made for the forecasting mode, tested for both training and inference.

Some open points:

How to name the variables and functions? Now everything is called "..._srclk" as abbreviation to "source like", indicating that the targets are being processed as sources.
Indeed in some cases it is possible to reuse some code for processing sources by looping through targets fsteps and processing each seperately. But this requires significant rewriting of the existing functions since the funcitons do not operate on simple input variables but input objects and access their fields.
Is it also relevant in the MTM mode? This is not tested.
A test with multiple datasets still needs to be done.
The saving of the latent variables is left for future work

Issue Number

Closes #941
Refs #941
Refs #941

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

kctezcan · 2025-09-25T15:16:16Z

@sophie-xhonneux @tjhunter @clessig Please have a look, thanks

clessig

Thanks for the contribution. Maybe best to have a quick call to go through the questions.

clessig · 2025-09-27T11:35:19Z

src/weathergen/datasets/multi_stream_data_sampler.py

                                )
                                (tt_cells_srclk, tt_lens_srclk, tt_centroids_srclk) = (
-                                    self.tokenizer.batchify_source( # TODO: KCT, check if anything source related is happening in the function
+                                    self.tokenizer.batchify_source(  # TODO: KCT, check if anything source related is happening in the function


Can we please remove all the KCT's. We missed a few last time ...

Also, what's the question here?

That was a reminder for me, all good, I removed it.

clessig · 2025-09-27T11:35:58Z

src/weathergen/datasets/tokenizer_forecast.py

        time_win: tuple,
        normalizer,  # dataset,
-        use_normalizer: str, # "source" or "target"
+        use_normalizer: str,  # "source" or "target"


Remove whitespace

unfortunately ruff is adding that whitespace back :)

clessig · 2025-09-27T11:36:12Z

src/weathergen/datasets/tokenizer_masking.py

        time_win: tuple,
        normalizer,  # dataset
-        use_normalizer: str, # "source" or "target"
+        use_normalizer: str,  # "source" or "target"


Remove whitespace

unfortunately ruff is adding that whitespace back :)

clessig · 2025-09-27T11:37:00Z

src/weathergen/datasets/utils.py

-                                ]
-                            )
-                            for s in stl_b
+                            s.target_srclk_tokens_lens[fstep]


Why did this change? Or is this from a different PR?

Not sure what your question is here

clessig · 2025-09-29T05:14:29Z

src/weathergen/datasets/utils.py

        for itype, s in enumerate(sb):
            for fstep in range(offsets.shape[0]):
-                if not (target_srclk_tokens_lens[ib, itype, fstep].sum() == 0): # if not empty
+                if not (target_srclk_tokens_lens[ib, itype, fstep].sum() == 0):  # if not empty


remove whitespace

again ruff...

clessig · 2025-09-29T05:14:52Z

src/weathergen/datasets/utils.py

-    zeros_col = torch.zeros((offsets_base.shape[0], 1), dtype=offsets_base.dtype, device=offsets_base.device)
-    offsets = torch.cat([zeros_col, offsets_base[:,:-1]], dim=1)
+    # take offset_base up to last col and append a 0 in the beginning per fstep
+    zeros_col = torch.zeros(


Can you expand on the comment? It's not clear to me why this is necessary

I rephrased it to:

# shift the offsets for each fstep by one to the right, add a zero to the beginning the first token starts at 0

clessig · 2025-09-29T05:15:06Z

src/weathergen/model/model.py

                    )
                    tokens_target = self.assimilate_global(model_params, tokens_target)
-                    tokens_target_det = tokens_target.detach() # explicitly detach as well
+                    tokens_target_det = tokens_target.detach()  # explicitly detach as well


Remove whitespace

clessig · 2025-09-29T05:15:52Z

src/weathergen/model/model.py

-            num_fsteps = target_srclk_tokens_lens.shape[2]  # TODO: KCT, if there are diff no of tokens per fstep, this may fail
+            num_fsteps = target_srclk_tokens_lens.shape[
+                2
+            ]  # TODO: KCT, if there are diff no of tokens per fstep, this may fail


Can we handle this special case if things might break. When would it be triggered.

Actually this is OK, it does not break, I checked.

clessig · 2025-09-29T05:16:18Z

src/weathergen/run_evaluate.py

@@ -0,0 +1,192 @@
+# (C) Copyright 2025 WeatherGenerator contributors.


Why all these changes here?

ah, i need this to start the code in debugging mode in VScode. It slipped into a commit, untracked it again.

kctezcan · 2025-09-29T09:50:18Z

Thanks a lot for the comments, @clessig

I have a question about how to handle empty target fsteps: see src/weathergen/model/model.py line 668.

We can keep the empty fsteps as well and we would not have to deal with the fstep shifts between the sources and targets. In the curent form, the code needs to introduce some offsetting if there is forecast_offset, for example.

sophie-xhonneux · 2025-09-29T09:51:53Z

src/weathergen/datasets/multi_stream_data_sampler.py


                            if rdata.is_empty():
                                stream_data.add_empty_target(fstep)
+                                stream_data.add_empty_target_srclk(fstep)


please change the name to add_empty_target_source_like or something more readable as a variable/function name

"source_like" sounds good. It is a bit long, but des not bother me, if @clessig you are also ok, I can change it.

Yes, srclk wasn't clear to me. Rather long and explicit.

sophie-xhonneux · 2025-09-29T09:55:49Z

src/weathergen/datasets/stream_data.py

            self.source_tokens_cells = torch.tensor([])
            self.source_centroids = torch.tensor([])

+        # >>>>>>>


sophie-xhonneux · 2025-09-29T09:59:58Z

src/weathergen/model/model.py


        return tokens_all

+    def embed_cells_targets_srclk(self, model_params: ModelParams, streams_data) -> torch.Tensor:


why is it necessary to duplicate the function embed_cells? I think ideally we avoid that, because each them the embedding engine changes this function also needs to change, ie it is quite prone to code rot

This is one of the discussion points for me as well. if we have a function that we can use both for sources and targets, then we could also use it n times for each fstep as well, i.e. taking the loop over fsteps out of the funciton, as you and Christian have suggested earlier.

As the code is written at this point, this function does not take a variable inside but accesses the variables directly from the streams_data object. So my suggestion would be:

rewrite the function to take not the object as input but the related variables

call it in a for loop over fsteps for the targets.

I will implement it so we can see how it looks and decide what is better.

Maybe ideally we can do that change in a separate PR? that way we can merge this if it looks good independently of the latent loss, e.g. faster ?

Yes, this should go into a separate PR.

kctezcan · 2025-09-29T12:14:59Z

src/weathergen/model/model.py

+            for _, sb in enumerate(streams_data):
+                for _, (s, embed) in enumerate(zip(sb, self.embeds, strict=False)):
+                    for fstep in range(num_fsteps):
+                        if s.target_source_like_tokens_lens[fstep].sum() != 0:


@clessig @sophie-xhonneux
what do you think? should we skip empty fsteps or return an empty tensor for those?

Empty tensor

implemented

clessig · 2025-09-29T16:53:28Z

Thanks a lot for the comments, @clessig

I have a question about how to handle empty target fsteps: see src/weathergen/model/model.py line 668.

We can keep the empty fsteps as well and we would not have to deal with the fstep shifts between the sources and targets. In the curent form, the code needs to introduce some offsetting if there is forecast_offset, for example.

We should remove the special case handling, yes. But can this go to a separate PR?

kctezcan · 2025-09-30T08:18:03Z

Thanks a lot for the comments, @clessig
I have a question about how to handle empty target fsteps: see src/weathergen/model/model.py line 668.
We can keep the empty fsteps as well and we would not have to deal with the fstep shifts between the sources and targets. In the curent form, the code needs to introduce some offsetting if there is forecast_offset, for example.

We should remove the special case handling, yes. But can this go to a separate PR?

Yes, of course.

kctezcan added 4 commits September 25, 2025 16:09

initial changes

1ab20ce

clean up debug statements

7a74aaa

reading model output as dictionary

c4129a3

added the config parameter with false as def

4c2d12f

github-project-automation bot added this to WeatherGen-dev Sep 25, 2025

ruff

029d0ad

clessig reviewed Sep 29, 2025

View reviewed changes

kctezcan added 5 commits September 29, 2025 09:42

removed a KCT and corrected some ..._normalizer comments

2f4197e

some comments and ruff changes

f73adda

ruff

cd30eb1

removed run_evaluate

5f42d43

using !=

5a38dc7

sophie-xhonneux reviewed Sep 29, 2025

View reviewed changes

kctezcan added 5 commits September 29, 2025 12:19

removed some comments >>>>

721de02

renamed everything srclk -> source_like

44f4a11

addressed ruff errors

b16d6ae

ruff

4304352

removed some KCt comments

97ffb58

kctezcan commented Sep 29, 2025

View reviewed changes

kctezcan added 2 commits September 30, 2025 15:45

added empty tensor for mising timestep

0946c00

ruff

b5f8ca3

kctezcan mentioned this pull request Sep 30, 2025

Ktezcan/dev/iss941 encode targets sepfstep #1019

Open

4 tasks

		@@ -0,0 +1,192 @@
		# (C) Copyright 2025 WeatherGenerator contributors.


		return tokens_all

		def embed_cells_targets_srclk(self, model_params: ModelParams, streams_data) -> torch.Tensor:

encoding targets into latent space #961

Are you sure you want to change the base?

encoding targets into latent space #961

Uh oh!

Conversation

kctezcan commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

kctezcan commented Sep 25, 2025

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kctezcan commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clessig commented Sep 29, 2025

Uh oh!

kctezcan commented Sep 30, 2025

Uh oh!

kctezcan commented Sep 25, 2025 •

edited

Loading