Generalize `DotLayer` to allow sparse inputs #741

albertz · 2021-11-08T10:37:27Z

This basically would unify GatherLayer and DotLayer.

This is the same logic as for LinearLayer, which also works with either dense or sparse input.

I'm not really suggesting to do this, I just want to raise the question here, whether this makes sense.
Maybe it is more clean however to not do this, and make it more explicit for the user.
Maybe it also was a mistake that LinearLayer does this. Other frameworks usually have a separate Embedding or EmbeddingLayer for the case of sparse inputs.

The text was updated successfully, but these errors were encountered:

Zettelkasten · 2021-11-08T11:09:25Z

Maybe it also was a mistake that LinearLayer does this. Other frameworks usually have a separate Embedding or EmbeddingLayer for the case of sparse inputs.

Why do you think it was a mistake to handle both, did you ever run into issues with this?
Are there any dangerous things that can happen here?
I at first found an Embedding layer a bit weird, mainly because of its name (I think SparseLookup or so would be a better name, this describes much better what it does. That it is an embedding is just a special interpretation).

If you think about this sparse dim as just another (implicit) dim in the shape, I think it makes a lot of sense. Just like we would probably also add these sparse dims to out_shape in some way (#706).
So I would support sparse inputs whenever possible (so also here for DotLayer). Also for rwth-i6/returnn_common#38 (comment) I would just have one method that accepts both.

albertz · 2021-11-08T11:18:15Z

Maybe it also was a mistake that LinearLayer does this. Other frameworks usually have a separate Embedding or EmbeddingLayer for the case of sparse inputs.

Why do you think it was a mistake to handle both, did you ever run into issues with this? Are there any dangerous things that can happen here?

No. I thought it might be maybe unintuitive. But maybe not. Or maybe only for people coming from other frameworks.

I at first found an Embedding layer a bit weird, mainly because of its name (I think SparseLookup or so would be a better name, this describes much better what it does. That it is an embedding is just a special interpretation).

Well, I think Embedding makes more sense, because it represents an embedding and it implies that it is trainable, it contains the embedding matrix.
SparseLookup sounds to me like it would do just a lookup, i.e. an alias to tf.gather. I.e. purely functional, not trainable. It would not contain the embedding matrix.

If you think about this sparse dim as just another (implicit) dim in the shape, I think it makes a lot of sense. Just like we would probably also add these sparse dims to out_shape in some way (#706). So I would support sparse inputs whenever possible (so also here for DotLayer). Also for rwth-i6/returnn_common#38 (comment) I would just have one method that accepts both.

Yes, I see this viewpoint.

But then GatherLayer is also basically obsolete, right?

GatherLayer doesn't require the sparse flag to be set though. But this is a detail, and maybe the sparse flag should have been set actually when it is not. For cases where it is not, you would just use ReinterpretDataLayer.

Although it might be a bit unintuitive for the user to realize when sparse is actually set and when it is not?

Zettelkasten · 2021-11-08T11:20:39Z

Hm okay, I do see some dangers with sparse tensors now. E.g. if you have a sparse [B,T] which is then actually virtually [B,T,F]. and then you e.g. reduce the T axis, I think it will reduce on the vocab ids currently. But instead it should make a dense [B,T,F] where F is just the sum of the one-hot vectors before.
And probably also in many other cases. Like also DotLayer here.

Zettelkasten · 2021-11-08T11:25:36Z

But then GatherLayer is also basically obsolete, right?

GatherLayer doesn't require the sparse flag to be set though. But this is a detail, and maybe the sparse flag should have been set actually when it is not. For cases where it is not, you would just use ReinterpretDataLayer.

Although it might be a bit unintuitive for the user to realize when sparse is actually set and when it is not?

Hm, yeah, I don't know.
I usually use the indices I pass to GatherLayer as positions which have a fixed order. So e.g. I would average + round them.
So here these indices tensors need to be dense, otherwise this should operate on one-hot vectors (which I think it currently does not, but I would argue that that is broken).
So in my case, I would always need to explicitly "mark it as sparse" if I wanted to gather using DotLayer.
Maybe that could be the main differentiation, that GatherLayer treats the inputs really as "indices", not so much as "one hot vectors" like the sparse flag would do?

(in the code, we can of course clean this up and for both GatherLayer and DotLayer use the same code)

albertz · 2021-11-08T11:28:00Z

Hm okay, I do see some dangers with sparse tensors now. E.g. if you have a sparse [B,T] which is then actually virtually [B,T,F]. and then you e.g. reduce the T axis, I think it will reduce on the vocab ids currently. But instead it should make a dense [B,T,F] where F is just the sum of the one-hot vectors before. And probably also in many other cases. Like also DotLayer here.

Is this only for sparse tensors interpreted as implicit dense tensors, but not for expansions because of implicit dims via dyn_size_ext? (#721)

But it's also a matter of interpretation and definition, what the ReduceLayer should do in this case.

Zettelkasten · 2021-11-08T11:34:58Z

Is this only for sparse tensors interpreted as implicit dense tensors, but not for expansions because of implicit dims via dyn_size_ext? (#721)

Ah true, it's actually a quite similar issue as #721. In both cases you need to make all those implicit dims explicit.

But it's also a matter of interpretation and definition, what the ReduceLayer should do in this case.

My interpretation was that sparse dims should always behave like they are implicitly dense.
But I also never used sparse dims before apart from embedding/specifying loss targets. We can discuss this (maybe in another issue, this is now much more general than DotLayer).

albertz · 2021-11-08T11:38:03Z

So in my case, I would always need to explicitly "mark it as sparse" if I wanted to gather using DotLayer.
Maybe that could be the main differentiation, that GatherLayer treats the inputs really as "indices", not so much as "one hot vectors" like the sparse flag would do?

There is some danger in this for DotLayer when there are situations to the user when it is not obvious whether the some output is sparse or dense (whether the sparse flag is set or not). Although when reduce is explicitly specified, maybe not so much. And multiplying int with float would also throw an error.

Note that many (almost all?) layers really ignore the sparse flag. So it's not at all consistent that layers use this consistently correct to interpret it as a one-hot dense tensor. LinearLayer is one of the rare exceptions. And the losses (CrossEntropyLoss etc) also handle that because it's very common there to have sparse targets.

So this viewpoint of interpreting it as a one-hot dense tensor, if we consider all the layers buggy which don't do that correctly, and if we would "fix" this, this could potentially break a lot of configs. And maybe this viewpoint is not always useful, except in some cases. I'm not sure...

When we have #706 and the user somehow specifies the implicit dimensions more explicitly, then this is maybe less an issue.

albertz · 2021-11-08T11:41:40Z

Although, to add, it is also rare that we actually have tensors with sparse=True at all, except for targets. So maybe by now fixing or changing some layers behavior on this, we would not really break anything.

albertz · 2021-11-09T08:12:08Z

Although, to add, it is also rare that we actually have tensors with sparse=True at all, except for targets. So maybe by now fixing or changing some layers behavior on this, we would not really break anything.

Extending on this thought, and getting back to the issue, so you say we should do this? I tend to agree.

albertz · 2021-11-21T16:00:24Z

Ok, no further comments here, so the conclusion: This should be implemented.

Fix #741

This was referenced Nov 8, 2021

DotLayer argument simplifications #739

Closed

Losses to implement and losses naming conventions rwth-i6/returnn_common#38

Closed

albertz mentioned this issue Nov 9, 2021

ReduceLayer on sparse inputs behaves incorrect #743

Closed

albertz changed the title ~~Generalize DotLayer to allow sparse inputs?~~ Generalize DotLayer to allow sparse inputs Nov 21, 2021

albertz added a commit that referenced this issue Jan 4, 2022

DotLayer, support sparse input

0fe589b

Fix #741

albertz mentioned this issue Jan 4, 2022

DotLayer, support sparse input #895

Merged

albertz closed this as completed in #895 Jan 4, 2022

albertz added a commit that referenced this issue Jan 4, 2022

DotLayer, support sparse input (#895)

95c3d95

Fix #741

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize `DotLayer` to allow sparse inputs #741

Generalize `DotLayer` to allow sparse inputs #741

albertz commented Nov 8, 2021

Zettelkasten commented Nov 8, 2021

albertz commented Nov 8, 2021 •

edited

Loading

Zettelkasten commented Nov 8, 2021

Zettelkasten commented Nov 8, 2021

albertz commented Nov 8, 2021

Zettelkasten commented Nov 8, 2021 •

edited

Loading

albertz commented Nov 8, 2021

albertz commented Nov 8, 2021

albertz commented Nov 9, 2021

albertz commented Nov 21, 2021

Generalize DotLayer to allow sparse inputs #741

Generalize DotLayer to allow sparse inputs #741

Comments

albertz commented Nov 8, 2021

Zettelkasten commented Nov 8, 2021

albertz commented Nov 8, 2021 • edited Loading

Zettelkasten commented Nov 8, 2021

Zettelkasten commented Nov 8, 2021

albertz commented Nov 8, 2021

Zettelkasten commented Nov 8, 2021 • edited Loading

albertz commented Nov 8, 2021

albertz commented Nov 8, 2021

albertz commented Nov 9, 2021

albertz commented Nov 21, 2021

Generalize `DotLayer` to allow sparse inputs #741

Generalize `DotLayer` to allow sparse inputs #741

albertz commented Nov 8, 2021 •

edited

Loading

Zettelkasten commented Nov 8, 2021 •

edited

Loading