Improvements for sparse modules from #4951 #4955

HechtiDerLachs · 2025-06-04T09:09:19Z

This turned out to be a bottleneck in the buildup to #4951 . To demonstrate the effect, try the following:

R, (x, y) = QQ[:x, :y]
F = free_module(R, 1)
@time G, _ = direct_sum([F for _ in 1:10000]);

Before:

julia> @time G, _ = direct_sum([F for _ in 1:10000]);
  2.458533 seconds (1.01 M allocations: 2.785 GiB, 52.62% gc time, 1.06% compilation time)

After:

julia> @time G, _ = direct_sum([F for _ in 1:10000]);
  0.043270 seconds (627.92 k allocations: 31.499 MiB, 61.40% compilation time)

Edit: This now contains further patches for bottlenecks snooped up during work on #4951 . Some of these are rather hotfixes and should still be pulled straight at some point. But everything works for the moment!

Controversial things to be discussed:

So far, symbols decided over equality of modules and were included in hashing. ~~This would have to go if we lazify them.~~ Changed back to using symbols for comparison and hashing. This can probably still be sped up according to @fingolfin 's comment below.
~~generator_symbols was replaced by symbols. This is not super necessary, but I was unaware of the existence of generator_symbols and started working with the other function.~~ generator_symbols is now left as an alias for symbols.
Use of sparse matrices for the transformation matrices in lift_std (see below and lift_std should use sparse matrices #4925 ).

fingolfin · 2025-06-04T10:13:49Z

Ugh. I think using Symbol instead of String is biting us heavily here. Compare:

julia> f(x,n) = [vcat(x,string(i)) for i in 1:n];

julia> g(x,n) = [Symbol(x,i) for i in 1:n];

julia> @b f("foobar", 500)
15.125 μs (2003 allocs: 70.469 KiB)

julia> @b g("foobar", 500)
110.125 μs (2503 allocs: 93.836 KiB)

So turning everything into a symbol make this quite a bit slower (because those symbols are stored in a global table, and are made "unique" by doing a lookup there).

HechtiDerLachs · 2025-06-04T10:25:21Z

Triage suggested that we keep the comparison of symbols to check equality (and in hashing).

The current implementation can cater for that, but a decision needs to be made. @fingolfin , @fieker ...

src/Modules/UngradedModules/Methods.jl

HechtiDerLachs · 2025-06-05T06:23:08Z

src/Modules/UngradedModules/ModuleGens.jl

+  SA = get_attribute!(generators, :sparse_transformation_matrix) do
+    A = get_attribute(generators, :transformation_matrix)
+    A === nothing && error("No transformation matrix in the Gröbner basis.")
+    sparse_matrix(A)
  end


@hannes14 : This is a temporary workaround to do things with sparse matrices for the lifting. Did you already look into providing methods to get a sparse data structure from Singular directly? Then I could adapt this PR to use that one directly.

HechtiDerLachs · 2025-06-05T06:39:22Z

I think, it's still quite annoying that we loose so much time in computing and comparing monomial orderings. Look at this:

R, (x, y) = QQ[:x, :y]
F = free_module(R, 10000)
I, _ = sub(F, [x*F[1]])
II = I.sub
@time y*F[1] in II

For the last line I get

julia> @time y*F[1] in II
  5.605941 seconds (90.35 M allocations: 3.587 GiB, 13.49% gc time, 1.32% compilation time)
false

3.5G for a trivial computation! Using ProfileView indeed reveals that the time is spent almost exclusively on canonical_matrix again.

This PR contains some changes so that at least canonical_matrix is not computed again and again all over the place. I will try to come up with a minimal example to demonstrate the glitch which was there before.

src/Modules/UngradedModules/Methods.jl

HechtiDerLachs · 2025-06-05T07:10:02Z

src/Modules/UngradedModules/SubModuleOfFreeModule.jl

 function in(a::FreeModElem, M::SubModuleOfFreeModule)
  F = ambient_free_module(M)
-  return iszero(reduce(a, standard_basis(M, ordering=default_ordering(F))))
+  return iszero(reduce(a, standard_basis(M, ordering=nothing)))


So I think, the glitch was here. Look at this:

R, (x, y) = QQ[:x, :y] F = free_module(R, 10000) I, _ = sub(F, [x*F[1]]) d = Dict{ModuleOrdering, Any}() ord = default_ordering(F); d[ord] = standard_basis(I, ordering=ord); @time d[ord]

I repeatedly get

julia> @time d[ord]; 0.447075 seconds

probably because of repeated hashing/canonical_matrix. The changes here do some dirty short-wiring of the dictionary lookup by checking whether any of the keys is === to ord. But that's O(n) and not O(1) as usual lookup of keys in dictionaries. So far we hardly ever have more than one groebner basis in any module, but still, it's not elegant what we do.

From the hacky changes here we get the following speedups.

R, (x, y) = QQ[:x, :y] F = free_module(R, 10000) I, _ = sub(F, [x*F[1]]) II = I.sub @time coordinates(x*y*F[1], I)

Before this PR:

julia> @time coordinates(x*y*F[1], I) 0.883578 seconds (10.29 k allocations: 400.250 KiB) Sparse row with positions [1] and values QQMPolyRingElem[y]

With this PR:

julia> @time coordinates(x*y*F[1], I) 0.000574 seconds (10.21 k allocations: 397.438 KiB) Sparse row with positions [1] and values QQMPolyRingElem[y]

Edit: It turns out that the glitch was not that canonical_matrix was computed over and over again, but that the already cached canonical_matrix was hashed over and over again (which takes ~0.44s). See the timings of the other post below for details.

codecov · 2025-06-05T07:12:47Z

Codecov Report

Attention: Patch coverage is 84.70588% with 13 lines in your changes missing coverage. Please review.

Project coverage is 84.81%. Comparing base (c4a6aae) to head (2147be4).
Report is 12 commits behind head on master.

Files with missing lines	Patch %	Lines
src/Modules/UngradedModules/SubquoModule.jl	30.00%	7 Missing ⚠️
...c/Modules/UngradedModules/SubModuleOfFreeModule.jl	90.47%	2 Missing ⚠️
src/Modules/ModulesGraded.jl	50.00%	1 Missing ⚠️
src/Modules/UngradedModules/FreeMod.jl	92.30%	1 Missing ⚠️
src/Modules/UngradedModules/ModuleGens.jl	75.00%	1 Missing ⚠️
src/Rings/MPolyMap/flattenings.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4955      +/-   ##
==========================================
- Coverage   84.88%   84.81%   -0.08%     
==========================================
  Files         697      698       +1     
  Lines       94190    94291     +101     
==========================================
+ Hits        79956    79971      +15     
- Misses      14234    14320      +86

Files with missing lines	Coverage Δ
src/Modules/ExteriorPowers/FreeModules.jl	`98.36% <100.00%> (-0.02%)`	⬇️
src/Modules/ExteriorPowers/SubQuo.jl	`96.55% <100.00%> (+0.06%)`	⬆️
src/Modules/FreeModElem-orderings.jl	`96.46% <100.00%> (ø)`
src/Modules/ModuleTypes.jl	`79.25% <100.00%> (+0.17%)`	⬆️
src/Modules/UngradedModules/DirectSum.jl	`86.04% <100.00%> (+0.10%)`	⬆️
src/Modules/UngradedModules/FreeModElem.jl	`85.43% <100.00%> (-0.28%)`	⬇️
src/Modules/UngradedModules/FreeModuleHom.jl	`93.82% <100.00%> (+0.03%)`	⬆️
src/Modules/UngradedModules/Methods.jl	`87.82% <100.00%> (ø)`
src/Modules/UngradedModules/SubQuoHom.jl	`76.55% <100.00%> (ø)`
src/Modules/UngradedModules/SubquoModuleElem.jl	`84.50% <100.00%> (ø)`
... and 8 more

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

HechtiDerLachs · 2025-06-05T09:33:49Z

Tests pass. From my side, this is good to go for the moment (if people are OK with switching from generator_symbols to symbols). If this was merged now, then the stuff with the sparse matrices in lift_std can be addressed later. The issue for that already exists and no harm would be done with these changes here.

lgoettgens · 2025-06-05T09:42:13Z

This PR now seems to contain several other optimizations that are not related to lazyness of the symbols anymore. Could you try to split them into a different PR? I think most of the other optimizations should be a clear gain and thus could be merged independent of the discussion around symbols in this PR. And it would make it a lot easier to review these two different things independently of each other.

HechtiDerLachs · 2025-06-05T09:44:15Z

As remarked in my edit at the beginning: This contains the changes and improvements for the modules from #4951, so this is already split off from something. Should I really chop it into even smaller pieces?

Edit: And I think, there's not much to be discussed anymore about the symbols, is there? I mean: Now this does not change anything about the user-facing functionality anymore (comparison and hashing work as before), but things are a lot faster. So controversies should be over (hopefully).

src/Modules/ExteriorPowers/FreeModules.jl

src/Modules/ExteriorPowers/SubQuo.jl

src/Modules/ModuleTypes.jl

src/Modules/UngradedModules/FreeMod.jl

src/Modules/ExteriorPowers/FreeModules.jl

src/Modules/UngradedModules/ModuleGens.jl

afkafkafk13

This looks fine to me now.

@fingolfin : All your objectons and nitpicks have been dealt with, as far as I see.

HechtiDerLachs · 2025-06-12T10:35:31Z

Thanks for the review, @afkafkafk13 ! As discussed with @jankoboehm yesterday, this should not be merged before #4915 and rebased after that.

afkafkafk13 · 2025-06-12T11:25:47Z

Thank you for making this dependency explicit and setting the label 'DO NOT MERGE'.

Co-authored-by: Max Horn <[email protected]>

HechtiDerLachs · 2025-06-20T09:01:21Z

@jankoboehm : This is now rebased and ready to merge from my side.

I remember, you had some concerns about the use of orderings. Let me summarize what I did here.

The hashing of monomial orderings for modules is still quite expensive. By now, we store the canonical_matrix in an attribute, which makes things slightly better, but still way too slow for lookups in dictionaries. An example is given above and I repeat it here:

R, (x, y) = QQ[:x, :y]
F = free_module(R, 10000)
I, _ = sub(F, [x*F[1]])
II = I.sub

Now the first call for hashing the default ordering:

julia> @time hash(default_ordering(I))
  5.390386 seconds (90.44 M allocations: 3.588 GiB, 11.91% gc time)

This needs to compute the canonical_matrix, too. So what about the second call?

julia> @time hash(default_ordering(I))
  0.450113 seconds (1 allocation: 16 bytes)

Still almost half a second! If we do this to look up a groebner basis for a lift_std on a vector of 100 elements, we loose almost a minute of computation time for nothing!

So I thought: If lookups of groebner bases or any creation and comparison of monomial orderings can possibly be avoided, we should do it. Many functions involving the ModuleGens take a kwarg ordering which is set to default_ordering of some module involved. I occasionally widened the signature to also allow nothing here which then internally falls back to "just use the default ordering or anything better that's already there, but avoid nasty lookups and comparisons at any cost".

I think, this does not need to interfere with any foundational design decisions. But please, let me know if you think differently!

Either way: We should really do something about not using module orderings in dictionaries. But ~~given that these things tend to take years and years to be discussed,~~ probably in a different PR... (Edit: I would like to apologize for the tonality. While it is true that these problems were first noticed already years ago, I acknowledge that such processes may require time. Improvements have been made in the meantime, but ihmo still not to a satisfactory extent, as I keep encountering these bottlenecks in my computations and applications. I also acknowledge that meaningful minimal examples exhibiting those bottlenecks have not always been provided and that it is then difficult to discuss architectures for improvements. But I hope that the material given here can take us a step forward.)

jankoboehm · 2025-06-25T09:26:18Z

I am ok with the nothing solution, if we cannot come up with something even better.

HechtiDerLachs · 2025-06-25T13:09:50Z

I introduced now two more fields in the SubModuleOfFreeModule struct: One for a groebner basis and one for a groebner basis with transition matrix. Both are used to avoid iterating through a dictionary in a randomized order. @benlorenz : Does this address your concerns?

jankoboehm · 2025-06-25T15:01:16Z

Thank you! Relative to our discussion earlier today: Do we really need the nothing option? Could we not just use the new fields (and also name them like that) for the Gröbner basis wrt the default ordering. The default ordering is stored in the ambient free module, so could just do a === check in the standard_basis command? That could then give an improvement also in many other situations. Or do I miss something?

benlorenz · 2025-06-25T17:34:53Z

I introduced now two more fields in the SubModuleOfFreeModule struct: One for a groebner basis and one for a groebner basis with transition matrix. Both are used to avoid iterating through a dictionary in a randomized order. @benlorenz : Does this address your concerns?

I did not follow all the details here but if it makes it stable across version this should be fine.

jankoboehm · 2025-06-25T18:12:09Z

Thank you! Yes it should make it stable. Shall we rename dummy into default?

HechtiDerLachs · 2025-06-26T05:53:36Z

Shall we rename dummy into default?

No. That would suggest that we find the gb for the default_ordering in that field which is not necessarily the case.

HechtiDerLachs · 2025-06-26T08:48:20Z

Good to go from my side.

jankoboehm · 2025-06-26T10:42:04Z

Thanks, any sounds better than dummy. I would indeed have used it only for the default ordering, but this way is also fine. Without interference it will only be used for the default. So this is good to go from my side.

HechtiDerLachs added topic: commutative algebra triage labels Jun 4, 2025

HechtiDerLachs force-pushed the module_improvements branch from 339c818 to 8d585af Compare June 4, 2025 10:36