Add std.math.egcd (Extended GCD). Micro-optimize GCD. #25533

two-horned · 2025-10-10T17:40:44Z

The EGCD is used for cryptographic purpose, for example to find the inverse of a number mod a prime.

I have implemented an iterative version of this algorithm.

The existing GCD has been microoptimized to gain some extra speed (3-5% faster).

I purposefully implemented the EGCD with the Euclidean method instead the Binary GCD, because whenever you'd shift off n zeros inside the Binary GCD, the coefficients would've needed to be multiplied by (2^-n) mod g (g being the gcd of the two inputs), or do some other funky stuff that's performance costly.

jedisct1 · 2025-10-11T07:55:13Z

For cryptographic purposes, the Berstein-Yang algorithm is a better choice than EGCD.

The Zig standard library uses it for field inversion in the NIST curves.

two-horned · 2025-10-11T16:58:33Z

the egcd is not fully fleshed out yet. for example there is capabilities for overflow errors.

thanks for your comment, @jedisct1.
regarding Bernstein and Yang's safegcd algorithm, you are totally right, it's the more cryptographically sound choice, but if you don't face side channel attacks, it is unnecessary to run a constant-time version of the Binary GCD (which it is in essence). the use of the Extended Euclidean Algorithm was already inside the ml_kem.zig file. notably, it is a fairly slow implementation of it as well!

btw, regarding finding inverses with a cryptographic algorithm there is also (imo) some improvement to be done in the standard library. a lot of code looks like it's been blindly ported from some other language, but Zig has capabilities that let you express it more elegantly, for example inline-for-loops.

I am not sure how welcome it is to let a draft develop into a bigger PR, but I wouldn't mind to add cryptographic counterparts of the gcd algos to the standard library too, which can be uniformly used by all crypto functions.

jedisct1 · 2025-10-11T18:28:45Z

a lot of code looks like it's been blindly ported from some other language

A bit insulting, but generally, that’s not the case. In fact, both libsodium and Go benefited from some of the tricks that were originally implemented in the Zig standard library.

Regarding inversions: when constant-time behavior is required, they currently use either addition chains (for “nice” primes) or the Bernstein–Yang inversion implemented in native Zig code generated by fiat-crypto, which ensures correctness. So please don’t change that.

That said, a generic EEA implementation is still useful for other scenarios, thanks for adding it!

two-horned · 2025-10-12T11:45:32Z

A bit insulting

I don't think I was and I am sorry if I offended you in any way, but taking a quick look at the code reveals this:

// Autogenerated: 'src/ExtractionOCaml/word_by_word_montgomery' --lang Zig --internal-static --public-function-case camelCase --private-function-case camelCase --public-type-case UpperCamelCase --private-type-case UpperCamelCase --no-prefix-fiat --package-name p384 '' 64 '2^384 - 2^128 - 2^96 + 2^32 - 1' mul square add sub opp from_montgomery to_montgomery nonzero selectznz to_bytes from_bytes one msat divstep divstep_precomp

That being said, I cannot claim that I am able to write something better than what already exists, so maybe this is the only way to get the compiler to do, what it is supposed to do?

Rexicon226 · 2025-10-12T18:01:13Z

I don't think I was and I am sorry if I offended you in any way, but taking a quick look at the code reveals this:

As mentioned above, this is code generated by fiat-crypto, and it is done so in order to ensure correctness. I'm not really sure from what other language this could have been copied, when it's generated.

two-horned · 2025-10-13T03:11:12Z

@Rexicon226 , very kind of you to join this conversation and verifying the very thing I am saying: There has been code that's been "blindly copied over from some other language".

I'm not really sure from what other language this could have been copied, when it's generated.

Auto-generation has its perks: you can quickly produce many variants of basically the same thing and you remove the human element of errors. What you ultimately sacrifice however is readability, including clarity of intend, helping users of the library understand what is implemented and why.

It took me a while, but I found the original formulation or template for this auto-generated code in the standard library: https://github.com/mit-plv/fiat-crypto/blob/master/src/Arithmetic/BYInv.v

Now here are some of my key issues I have as a simple user of the standard library. These are my opinions and might be shared by others or not, nonetheless I want to voice myself. If a healthy communication is not welcomed, I can respect that decision of yours.

The code in its current reads like machine code. I think, it differs little to none from the experience of reading assembler code. Reason is, the code was produced by a machine and it was never the intention for a human to actually read this produced code as primary source for the algorithm. Also, since the fiat-crypto repository cannot formally proof that Zig's AST is correct, so also speaking in a matter of const-time-ness, is it not smarter to just inline the assembler code directly in the functions? For the sake of correctness you would have verify the produced machine code anyways, so why not simply use the auto-generated assembler?
The source of the code is obscure. As implied by me looking for the original implementation quiet some time (I had to understand the fiat-crypto codebase), and also by you @Rexicon226 , it is not clear where the code came from. If a standard library user opened the documentation, he might know it is being auto-generated from this fiat-crypto repository, but it will take him additional time to understand which file he has to look for. Also, because the template is written in Coq, he now also needs to understand Coq. Clearly not a friendly setting.
There is a lack of trustworthiness. How can you make sure, the code is trustworthy? Especially for cryptographic algorithms, there is a high potential that some of kind of supply chain attack, where the fiat-crypto repository is being hijacked or maybe an error on their side is unnoticed. We now not only have to trust the developers of the Zig team to write correct code, we also have to trust, the template masters over on fiat-crypto have everything under control. What raised my eyebrows was the use of AI (Copilot) (yes I've read their commit messages). LLMs are notorious for being confidentially incorrect, so I definitely had to double check whether or not the ai assistant had something to do with the actual Coq proof.

So having my opinions and complaints written out more elaborately, the only logical next step for me is to provide some suggestions regarding each point:

Write code by hand. Not all code has to be written by hand, as sometimes it is simply infeasible, but Zig is such a beautiful language that allows you to actually do a lot of compile time code generation inside of Zig itself. With more code being handwritten or templates being inside the Zig standard library itself, it is very easy to read and comprehend as a human, what the implementation is about.
One of the bitcoin repositories has a FANTASTIC documentation about the safegcd algorithm. The paper might be interesting to read, but a quick but comprehensive summary is literal gold for every user that browses the standard library documentation for some valuable explanation.
Just keep it in Zig. If the code generation cannot be done inside this very repository, atleast let it be owned by ziglang. That way any user can sleep sound in his sleep, the people he trusts his compiler with are also working on his crypto algos. This is btw also what the bitcoin repository did (as seen by one of the links above).

Conclusively I can just reiterate myself, do not take my critique as an insult. I am trying to be constructive or altleast voice an (by me perceived) issue in a neutral and professional manner. I never knew I would have to write this long explanation for a very simple (admittedly spicy) remark, but in my opinion it was justified as you might think so now too. I am happy to hear your perspectives on this matter and hope we can stay respectful to each other.

Rexicon226 · 2025-10-13T03:36:51Z

Reason is, the code was produced by a machine and it was never the intention for a human to actually read this produced code as primary source for the algorithm.

This is true. The use case we need for this code is to be correct, not readable.

Also, since the fiat-crypto repository cannot formally proof that Zig's AST is correct, so also speaking in a matter of const-time-ness, is it not smarter to just inline the assembler code directly in the functions?

The problem here is the lack of constant-time-ness verification on the Zig AST, rather than any potential input sources for the Zig code being generated. This will be addressed in the future with #1776. It is not smarter to inline the assembler code because this

Makes it target dependent, which would require us to have tens of thousands of lines of assembly, which is much more difficult to audit.
It is no different from just generating Zig. If you cannot trust the Zig compiler to be correct, then all of the cryptography code in the stdlib is invalid for your use case. You already need to trust that LLVM correctly generates the assembly for the rest of the cryptography code. Using something like bedrock to generate assembly does not increase the overall security in my eyes.

If a standard library user opened the documentation, he might know it is being auto-generated from this fiat-crypto repository, but it will take him additional time to understand which file he has to look for. Also, because the template is written in Coq, he now also needs to understand Coq. Clearly not a friendly setting.

We require it to be correct. Not fast (fiat-crypto generates some pretty bad code all things considered), nor readable (that already exists in the form of std.crypto.ff if needed, or look at a 3rd-party implementation).
At the end of the day, the Zig generated by fiat-crypto is, to me personally, more trustworthy than a hand-written implementation. It is not particularly difficult to audit the fiat-crypto generated code if you know what to look for.

How can you make sure, the code is trustworthy?

The same way you trust other cryptographic code in the standard library written and/or accepted by @jedisct1. If the level of trust you require is greater (for me personally, at work, we have some algorithms which we've written ourselves, with further testing and verification of correctness), then the Zig standard library isn't for you. And that's totally ok.

I would also like to add that the underlying implementation of a finite field isn't really something that can be a source of vulnerability. Perhaps it could be written in a way that causes a program to panic, but if it is implemented wrong in a malicious way would just cause most algorithms written on top to not work. But that is a subjective argument, so take it as you will.

These generated chunks of code are also not updated pretty much ever, so a supply-chain-based attack is unlikely to happen.

Write code by hand.

This defeats the entire purpose of using a formally verified generator for the implementation. And as mentioned before, we already have something like it in std.crypto.ff. See the above points for a response to the rest of the point.

The paper might be interesting to read, but a quick but comprehensive summary is literal gold for every user that browses the standard library documentation for some valuable explanation.

I personally wouldn't be against adding this link in the doc comment above,

zig/lib/std/crypto/pcurves/common.zig

Line 199 in 958faa7

pub fn invert(a: Fe) Fe {

but keep in mind that I am not a core member, so my opinions are my own :).

That way any user can sleep sound in his sleep, the people he trusts his compiler with are also working on his crypto algos.

I guess this could make sense if we were actually re-generating these files at some point in time; however, its purpose is defeated by just the fact that the code is looked at and reviewed by Frank. If you are this concerned about correctness, I think that the finite field implementations are the last thing you should be worrying about. Understand that the Zig stdlib cryptography code is not audited. If your use cases require the utmost correctness, I do not believe it is the correct library for you to be using.

jedisct1 · 2025-10-13T07:52:14Z

OpenSSL has experienced multiple severe carry propagation bugs in its finite field implementations (ex: CVE-2017-3732, CVE-2017-3736 and CVE-2021-4160). Bugs can happen everywhere including in hardware, but nowadays, tools that guarantee correctness of the original source code exist, and Zig has the privilege to be a supported target one of these tools. Not taking advantage of that would be a going back in the past. Using them gives us well-trusted implementations, one common representation and one less thing to worry about. fiat-crypto can also generate platform-specific assembly code that retains the same public API, but helps a lot to ensure that the code runs in constant time. This is something we can take eventually also advantage of.

But maybe we should get back to the PR.

ECGD in std.math is good and useful, and your implementation look good.

jedisct1 · 2025-10-13T08:01:34Z

Ran a quick GCD benchmark:

https://gist.github.com/jedisct1/0ddfe5484c6ea273c32efd73b40924c0

Test Case	Old (ms)	New (ms)	Result
Small Numbers	1.979 avg	2.000 avg	OLD 1.06% faster
Powers of 2	0.708 avg	0.690 avg	NEW 2.47% faster
High Trailing Zeros	0.666 avg	0.665 avg	Equal (-0.10%)
Coprime Numbers	1.940 avg	1.931 avg	Equal (-0.47%)
Large Numbers	3.206 avg	3.217 avg	Equal (0.35%)
Fibonacci (Worst)	6.800	6.928	OLD 1.88% faster
Random Numbers	61.645	60.785	NEW 1.39% faster

The improvement seems to be usage of @shrExact and @shlExact instead of >> and <<, so maybe we could keep the current, simpler code;

diff --git a/lib/std/math/gcd.zig b/lib/std/math/gcd.zig
index 16ca7846f19a..minimal_exact_shifts 100644
--- a/lib/std/math/gcd.zig
+++ b/lib/std/math/gcd.zig
@@ -26,16 +26,16 @@ pub fn gcd(a: anytype, b: anytype) @TypeOf(a, b) {
     const xz = @ctz(x);
     const yz = @ctz(y);
     const shift = @min(xz, yz);
-    x >>= @intCast(xz);
-    y >>= @intCast(yz);
+    x = @shrExact(x, @intCast(xz));
+    y = @shrExact(y, @intCast(yz));

     var diff = y -% x;
     while (diff != 0) : (diff = y -% x) {
         const zeros = @ctz(diff);
         if (x > y) diff = -%diff;
         y = @min(x, y);
-        x = diff >> @intCast(zeros);
+        x = @shrExact(diff, @intCast(zeros));
     }
-    return y << @intCast(shift);
+    return @shlExact(y, @intCast(shift));
 }

lib/std/math/egcd.zig

lib/std/math/gcd.zig

Fri3dNstuff · 2025-10-13T08:53:48Z

The improvement seems to be usage of @shrExact and @shlExact instead of >> and <<

@jedisct1, I don't think it is: Compiler Explorer shows that changing the shifts into builtin calls for the version currently on master does not affect the generated machine code.

I am in favour of changing the shifts into builtin calls regardless of whether that affects the output - they simply better encode the knowledge we have.

jedisct1 · 2025-10-13T09:39:58Z

I am in favour of changing the shifts into builtin calls regardless of whether that affects the output - they simply better encode the knowledge we have.

Yep. Doesn't hurt.

two-horned · 2025-10-13T13:05:19Z

ECGD in std.math is good and useful, and your implementation look good.

I am not happy with it now, because it might overflow, hence it only being a draft.
There is couple of ways to go around this issue and I might look back into the Binary GCD, because calculating the coefficient does not require multiplication with a little trick I just discovered from somewhere else.

Still, thank you all for taking the time.

@jedisct1

The improvement seems to be usage of @shrExact and @shlExact instead of >> and <<, so maybe we could keep the current, simpler code;

I have spent a lot of time hand optimizing the code for 64 bit words on x86 (on my zen2 architecture).
There is no difference when (of the generated assembler) when you use @shrExact instead of >> or not.

It is crucial to calculate the every step in the exact order I have written down to achieve the optimal assembler output which I have written by hand:

  .text
  .globl gcd_hand_optimized
  .type gcd_hand_optimized @function
gcd_hand_optimized:
  test    %rdi, %rdi
  je      .EARLY_RET_Y      # if zero, early return
  test    %rsi, %rsi
  je      .EARLY_RET_X      # if zero, early return
  tzcntq  %rsi, %rcx        # remove tz from second input (ctz)
  tzcntq  %rdi, %rdx        # remove tz from first  input (ctz)
  shrxq   %rdx, %rdi, %rdi  # remove tz from first input (shift)
  shrxq   %rcx, %rsi, %rsi  # remove tz from second input (shift)
  cmpq    %rcx, %rdx        # save minimum of shifts (compare)
  cmovbq  %rdx, %rcx        # save minimum of shifts (move)
  movq    %rsi, %r8         # create copy of y
  subq    %rdi, %r8         # calculate y - x
  .LOOP:
  movq    %rdi, %r9         # create copy of x
  tzcntq  %r8,  %rdx        # saving zero count in dx
  subq    %rsi, %rdi        # subtract y from x
  cmovbq  %r8,  %rdi        # move y - x to x if carry was set
  cmovbq  %r9,  %rsi        # replace y with x if carry was set
  shrxq   %rdx, %rdi, %rdi  # remove tz from first input (shift)
  movq    %rsi, %r8         # create copy of y
  subq    %rdi, %r8         # calculate y - x
  jne     .LOOP
  shlxq   %rcx, %rsi, %rax  # return y with appropriate shift
  ret
  .EARLY_RET_X:
  movq    %rdi, %rax
  ret
  .EARLY_RET_Y:
  movq    %rsi, %rax
  ret

  .size gcd_hand_optimized, .-gcd_hand_optimized

jedisct1 · 2025-10-13T14:53:57Z

Binary appears to be much faster. I use this naive implementation:

const std = @import("std");

/// (n / 2) mod m
fn halveMod(comptime T: type, n: T, m: T) T {
    if (n & 1 == 0) {
        return n >> 1;
    } else {
        const WideT = std.meta.Int(.unsigned, @bitSizeOf(T) + 1);
        const n_wide: WideT = n;
        const m_wide: WideT = m;
        const result = (n_wide + m_wide) >> 1;
        return @intCast(result);
    }
}

/// (a - b) mod m
fn subMod(comptime T: type, a: T, b: T, m: T) T {
    if (a >= b) {
        return a - b;
    } else {
        return m - (b - a);
    }
}

/// Returns the modular inverse of y modulo m, or 0 if it does not exist.
/// Requires m to be an odd integer >= 3, and 0 <= y < m
pub fn modInverse(comptime T: type, y: T, m: T) 0 {
    std.debug.assert(m >= 3);
    std.debug.assert(m & 1 == 1); // m must be odd
    std.debug.assert(y < m);

    var a: T = y;
    var u: T = 1;
    var b: T = m;
    var v: T = 0;

    while (a != 0) {
        if (a & 1 == 0) {
            a = a >> 1;
            u = halveMod(T, u, m);
        } else {
            if (a < b) {
                // Swap (a, u, b, v) ← (b, v, a, u)
                const temp_a = a;
                const temp_u = u;
                a = b;
                u = v;
                b = temp_a;
                v = temp_u;
            }
            // Now a >= b and both are odd
            // a ← (a − b)/2
            a = (a - b) >> 1;
            // u ← (u − v)/2 mod m
            u = halveMod(T, subMod(T, u, v, m), m);
        }
    }
    if (b != 1) return 0; // Inverse does not exist
    return v;
}

Needs to use signed integers if we really want both Bézout's coefficients.

Not sure that these micro-optimizations will make any practical difference, though.

jedisct1 · 2025-10-16T21:55:59Z

lib/std/math/egcd.zig

+
+    y = @shlExact(y, @intCast(shift));
+    s = @shlExact(s, @intCast(shift));
+    // Using integer widening is only a temporary solution.


Is this still a work in progress?

One can avoid integer widening by implementing a custom function for terms of the form (a - b * c) / d. Something like this:

inline fn mul_div_thing(a: anytype, b: anytype, c: anytype, d: anytype) @TypeOf(b, c, d) { @setEvalBranchQuota(10_000_000); const S = @TypeOf(b, c, d); const sign_ = -1 * sign(b) * sign(c) * sign(d); var x = @abs(b); const y = @abs(c); const z = @abs(d); var acc = a; var counter: S = 0; if (x == 0) return sign_; while (x > 0) : (x -= 1) { acc += y; while (acc >= z) { acc -= z; counter += 1; } } return counter * sign_; }

As you can see, this solution is highly inefficient because I am not doing long multiplication / division. If I implement long multiplication / division, this algorithm becomes much more complicated, so I didn't even bother really. Still, I want to encourage looking for something that doesn't require widening the integer type, that's why I left the comment.

lib/std/math/egcd.zig

jedisct1 · 2025-10-16T22:09:02Z

const a: i8 = 127;
const b: i8 = 126
const c = egcd(a, b);

=> likely to crash with an integer overflow because s + ctrl = 128 which doesn't fit in i8.

jedisct1 · 2025-10-16T22:11:05Z

Even the stdlib tests don't pass:

lib\std\math\egcd.zig:33:52: error: overflow of integer type 'i13' with value '4994'
            s = @shrExact(if (s & 1 == 0) s else s + ctrl, 1);
                                                 ~~^~~~~~
lib\std\math\egcd.zig:93:50: note: called at comptime here
        const gcd, const s, const t = egcd_helper(x, y, yz);
                                      ~~~~~~~~~~~^~~~~~~~~~
lib\std\crypto\ml_kem.zig:639:28: note: called at comptime here
    const r = std.math.egcd(a, p);
              ~~~~~~~~~~~~~^~~~~~
lib\std\crypto\ml_kem.zig:550:40: note: called at comptime here
const r2_over_128: i32 = @mod(invertMod(128, Q) * r2_mod_q, Q);

two-horned · 2025-10-17T03:12:43Z

My mistake for not running the tests locally first. I fixed the issue and added another test.

jedisct1 · 2025-10-18T14:36:29Z

The GCD changes are good, and should be merged.

However, the EGCD code doesn't seem to be quite ready.

The returned Bézout coefficients are off by a factor of 2^min(ctz(a), ctz(b)) whenever both inputs share a power-of-two factor.

Example: egcd(512, 510) yields gcd=2, s=2, t=-2, but 512*s + 510*t = 4 instead of 2.

The code also crashes when the minimum value of a type is used for both operands.
For example i16(-32768, -32768) crashes with:

panic: integer does not fit in destination type
simple_crash_demo.zig:102:26: 0x1043d6f07 in egcd__anon_20205 (simple_crash_demo)
        return .{ .gcd = @intCast(gcd), .bezout_coeff_1 = s, .bezout_coeff_2 = t };
                         ^
simple_crash_demo.zig:132:13: 0x1043d4e3f in main (simple_crash_demo)
    _ = egcd(@as(i16, -32768), @as(i16, -32768));

Also, while the GCD properly works with all the regular scalar types supported by the language, the EGCD doesn't currently compile with types > i32767. Consistency between both would be good.

I’d recommend closing this PR and opening a new one focused solely on the GCD changes, which are solid and can be merged.

EGCD can be implemented and tested separately, then proposed in a separate PR. The first implementation can be simple and conventional. Optimizations can always be added later.

two-horned · 2025-10-18T16:59:54Z

Thanks for reviewing again. I didn't plan to see this PR merged right-away, so feel free to review whenever you like. The algorithm for the egcd is correct, I was simply too stupid to fix the little artifacts I left when I translated code and played around a little bit. If you want I can convert this PR into a draft again.

The power of 2 bug was because of literally one line I just fixed.

lib/std/math/gcd.zig

two-horned · 2025-10-18T17:46:11Z

Fixed the issue related to the min value. I've removed the helper function because interacting with the result types of it was a bit bothersome.

jedisct1 · 2025-10-20T10:33:17Z

egcd(@as(i8, -128), @as(i8, 127)) now crashes with an integer overflow because 126-(-63)=189 > 127

egcd(@as(i1, -1), @as(i1, 0)) returns a compile error std.math.sign.

With i2, divFloor by 2 fails.

And all unsigned types return an error.

Benchmarks also show that depending on the size, the classic iterative method can also be up to 2x faster. Modern CPUs have fast 64-bit division (which works well for sizes such as 64 and 512 bits) and the classic algorithm has fewer iterations and the algorithm is simpler with more predictable memory access. So there's no one-size-fits all method.

It is useful to have EGCD in std.math so that applications and other parts of the standard library can use it.

Optimizations are great, and we definitely want them. However, especially if this is something that gets eventually used by crypto functions, correctness is critical. And a classic iterative implementation is already extremely fast.

So let's start with a classic implementation that works for all inputs: #25642

And then take the time to test optimizations before incrementally adding them for the cases they are the most effective.

two-horned · 2025-10-20T15:53:01Z

@jedisct1 Recent commit fixed that issue regarding divFloor operator, simply switched to an explicit shift.

All checks pass now and I have double checked this time that all the temporary results indeed do not overflow.

Also, I don't know why you suddenly try to re-introduced my scrapped version of the EGCD, because as I told you before: I think integer widening is bad – which was also the motivation behind my implementation of this Binary version.

And all unsigned types return an error.

There is a reason for that: The egcd is defined for signed integers. If you want to calculate the egcd for your unsigned inputs, you can simply upcast them, if that's your goal. The result type is always unsigned, the coefficients always signed. This is also a very handy feature, because now you don't have to add a single bit to the result type's width. And coefficients are of the same width as the inputs (also nice when you calculate the inverse of a number in a field: no need to cast).

the classic algorithm has fewer iterations and the algorithm is simpler with more predictable memory access. So there's no one-size-fits all method.

I am not sure about this claim, but if you can back it up, I can/will accept this result of yours.

Again, I was never here to merge my code ASAP. I opened this PR to be transparent in my (slow) progress. I am busy with different things too and this PR doesn't enjoy much of a priority right now. So I feel very odd too, if someone suddenly decides to create his own PR. I think, if adding an egcd function to the std library was really that important to you, you would've done it much earlier. So I am not sure what message you try to convey is. Funny enough, even with all the proving you still ran into the same issue as me: It doesn't even pass the standard library tests. So let's not pretend that overlooking the couple overflows in my draft is reason to scrap my attempt completely.

ofalkenberg · 2025-10-21T07:49:49Z

There is a reason for that: The egcd is defined for signed integers. If you want to calculate the egcd for your unsigned inputs, you can simply upcast them, if that's your goal.

The extended Euclidean algorithm isn’t defined for signed integers in the sense of restricting the domain. It’s defined over the integers ℤ, and in practice we almost always call it with non-negative (unsigned) inputs.

Plenty of implementations (in GMP, Python, Rust, etc.) accept unsigned inputs and simply return signed coefficients. We might reasonably want egcd to work without forcing conversions to signed types.
If you’re designing a Zig API, a conventional shape would be something like: inputs of any integer type (often unsigned), return x,y in a signed type (same width or wider, to avoid overflow in intermediates). That matches both the math and common library practice.

two-horned · 2025-10-21T08:24:22Z

I am not sure what point you're trying to make, but ℤ is the signed integers or simply integers ({0,-1,1,-2,2,...}), which I basically said too. Sure, you can define the egcd for any field, there is no problem in that.

The egcd that I have replaced in this PR (used by ml_kem) was also written for signed integers. There is no problem in making the egcd work with unsigned integers too, I can take care of that. This would basically mean to do the up-casting inside the egcd function.

For anyone interested: here are some points that I still wonder about in my current implementation:
a) Depending on whether one or the other argument has less trailing zeros, this side is guaranteed to have a non-negative Bezout coefficient. I am not sure if this is desirable (definitely questionable for cryptographic use).
b) Performance. While my implementation is about twice as fast as a overflow-safe Euclidean algorithm, according to my own tests (random i64 inputs)...

Time needed (.binary): 630ms
Time needed (.euclid2): 1276ms

... it is still not quiet where I want it yet. To compare, the fastest but not overflow safe version of the EEA takes only 450ms time.

So these are the two issues I am looking forward to solve next.

PS: Regarding performance, the version Frank Denis linked in his own PR needs 1362ms in my test. Slightly slower than my overflow-safe EEA, but only because of the order of instructions.

Fix range obscurity and compile error. Revert change for converting comptime int.

Fix some comments in GCD. Make ml_kem use lcm and egcd from std/math. Fix name. Add egcd function. Don't destructure. Use binary gcd and make overflow safe. Force inlining, use ctz to reduce dependency in loop. Avoid integer overflow for temporary value. Add test against previous overflow capability. More optimization friendly expression. Fix egcd for even numbers. Minvalue causes crash. Remove helper function. Fix casting issues. Use shift instead division (to support i2) and avoid overflow of temp results.

two-horned force-pushed the faster-gcd branch 3 times, most recently from b8e054b to fffe0c7 Compare October 10, 2025 18:35

two-horned changed the title ~~Add proper Extended Greatest Common Divisor function. Micro-optimize GCD.~~ Add std.math.egcd (Extended GCD). Micro-optimize GCD. Oct 10, 2025

jedisct1 reviewed Oct 13, 2025

View reviewed changes

lib/std/math/egcd.zig Outdated Show resolved Hide resolved

lib/std/math/egcd.zig Outdated Show resolved Hide resolved

lib/std/math/gcd.zig Outdated Show resolved Hide resolved

two-horned force-pushed the faster-gcd branch 2 times, most recently from 6da3f5c to 7a3d838 Compare October 16, 2025 00:38

two-horned marked this pull request as ready for review October 16, 2025 00:39

two-horned requested a review from jedisct1 October 16, 2025 00:39

jedisct1 reviewed Oct 16, 2025

View reviewed changes

lib/std/math/egcd.zig Outdated Show resolved Hide resolved

two-horned force-pushed the faster-gcd branch 2 times, most recently from 7000c52 to 3c55282 Compare October 17, 2025 01:48

two-horned marked this pull request as draft October 18, 2025 17:02

two-horned force-pushed the faster-gcd branch from 97b4cd6 to fc36653 Compare October 18, 2025 17:08

Fri3dNstuff reviewed Oct 18, 2025

View reviewed changes

lib/std/math/gcd.zig Show resolved Hide resolved

two-horned added 2 commits October 21, 2025 18:58

Microptimization.

91139db

Fix range obscurity and compile error. Revert change for converting comptime int.

two-horned force-pushed the faster-gcd branch from 6c9ab98 to ba5f824 Compare October 21, 2025 16:58

Uh oh!

Add std.math.egcd (Extended GCD). Micro-optimize GCD. #25533

Are you sure you want to change the base?

Add std.math.egcd (Extended GCD). Micro-optimize GCD. #25533

Conversation

two-horned commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jedisct1 commented Oct 11, 2025

Uh oh!

two-horned commented Oct 11, 2025

Uh oh!

jedisct1 commented Oct 11, 2025

Uh oh!

two-horned commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rexicon226 commented Oct 12, 2025

Uh oh!

two-horned commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rexicon226 commented Oct 13, 2025

Uh oh!

jedisct1 commented Oct 13, 2025

Uh oh!

jedisct1 commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fri3dNstuff commented Oct 13, 2025

Uh oh!

jedisct1 commented Oct 13, 2025

Uh oh!

two-horned commented Oct 13, 2025

Uh oh!

jedisct1 commented Oct 13, 2025

Uh oh!

jedisct1 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

two-horned Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jedisct1 commented Oct 16, 2025

Uh oh!

jedisct1 commented Oct 16, 2025

Uh oh!

two-horned commented Oct 17, 2025

Uh oh!

jedisct1 commented Oct 18, 2025

Uh oh!

two-horned commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

two-horned commented Oct 18, 2025

Uh oh!

jedisct1 commented Oct 20, 2025

Uh oh!

two-horned commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofalkenberg commented Oct 21, 2025

Uh oh!

two-horned commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

two-horned commented Oct 10, 2025 •

edited

Loading

two-horned commented Oct 12, 2025 •

edited

Loading

two-horned commented Oct 13, 2025 •

edited

Loading

two-horned Oct 17, 2025 •

edited

Loading

two-horned commented Oct 18, 2025 •

edited

Loading

two-horned commented Oct 20, 2025 •

edited

Loading

two-horned commented Oct 21, 2025 •

edited

Loading