Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] builds fail with sending never-build-twice request: mismatched build ID for ... #507

Open
sjanel opened this issue Jan 24, 2025 · 12 comments · May be fixed by #522
Open

[BUG] builds fail with sending never-build-twice request: mismatched build ID for ... #507

sjanel opened this issue Jan 24, 2025 · 12 comments · May be fixed by #522
Assignees
Labels
bug Something isn't working

Comments

@sjanel
Copy link

sjanel commented Jan 24, 2025

Important

Maintainer note:
If you are encountering this issue, and use orchestrion by specifying the -toolexec='orchestrion toolexec' argument to go (either via command line arguments, or through GOFLAGS), the issue is likely caused by:

  1. You have installed some version of orchestrion (go install github.com/DataDog/orchestrion@<some-version>)
  2. Your go.mod file lists another version of orchestrion

In this case, making sure both versions are identical should fix the issue (if not, please tell us in this issue!).

If you are facing the same problem, but your situation does not match the above description, please let us know by filing a separate issue.

Version of orchestrion
1.0.3

Describe what happened:
Since today, I cannot build some of our micro-services with orchestrion. Here is an extract of logs I get in the go buildcommand:

#16 89.36 sending never-build-twice request: mismatched build ID for "sync/atomic": "3A1CLLn9Q6jnXEL4WfTC/3A1CLLn9Q6jnXEL4WfTC" != "PxIKMSNesE1tGk6jxW68/PxIKMSNesE1tGk6jxW68"
#16 89.36 -: # internal/godebugs
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/godebugs": "4gNYbdB7k-Lxicu-bDPb/4gNYbdB7k-Lxicu-bDPb" != "3iGATNQByhC2W3eE5YoK/3iGATNQByhC2W3eE5YoK"
#16 89.36 -: # internal/race
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/race": "ntwMDWhcLGfGB-u85-hr/ntwMDWhcLGfGB-u85-hr" != "Cs-OKq4eFdzAa-H6E2Sr/Cs-OKq4eFdzAa-H6E2Sr"
#16 89.36 -: # internal/goexperiment
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/goexperiment": "bcDhOEOGMfS059ucNwlc/bcDhOEOGMfS059ucNwlc" != "DX1Zfyf9VWPnJnMlVYmN/DX1Zfyf9VWPnJnMlVYmN"
#16 89.36 -: # internal/goos
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/goos": "zRhU4mh_wktcsl0M_szK/zRhU4mh_wktcsl0M_szK" != "fbhkRpj4jP_3rgmYeY1W/fbhkRpj4jP_3rgmYeY1W"
#16 89.36 -: # internal/cpu
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/cpu": "BBZvZV26hr-0JI6Lo1PT/BBZvZV26hr-0JI6Lo1PT" != "x1SG1wxC4tQEQhm1cs7F/x1SG1wxC4tQEQhm1cs7F"
#16 89.36 -: # internal/goarch
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/goarch": "gwQ89hWjJOOIkI6ho_gc/gwQ89hWjJOOIkI6ho_gc" != "g9PlaGbp4lE0n0WKGnZT/g9PlaGbp4lE0n0WKGnZT"
#16 89.36 -: # internal/cpu
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/cpu": "BBZvZV26hr-0JI6Lo1PT/BBZvZV26hr-0JI6Lo1PT" != "x1SG1wxC4tQEQhm1cs7F/x1SG1wxC4tQEQhm1cs7F"
#16 89.36 -: # internal/runtime/atomic
#16 89.36 sending never-build-twice request: mismatched build ID for "internal/runtime/atomic": "zwhihA7OyTql8MynRG1k/zwhihA7OyTql8MynRG1k" != "tihqsHspYa5l3KybMTIU/tihqsHspYa5l3KybMTIU"
#16 89.36 -: # internal/byteorder

Describe what you expected:
Build success

Steps to reproduce the issue:
I don't have at hand a minimum reproducible example, sorry. But I know that it works for v1.0.2, it seems to be a regression with v1.0.3.

I can share though my build command and go.mod extract:

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -toolexec=\"orchestrion toolexec\"     -ldflags='-w -s -extldflags \"-static\"' -a -o /bin/foo .

main go.mod

module foo

go 1.23

replace foo/common => ../common

require (
	foo/common v0.0.0-00010101000000-000000000000
	github.com/DataDog/orchestrion v1.0.2
	github.com/gorilla/mux v1.8.1
	github.com/gorilla/schema v1.4.1
	github.com/jinzhu/gorm v1.9.16
)

common go.mod:

module foo/common

go 1.23

Additional environment details (Version of Go, Operating System, etc.):
Compiling in amd64, inside an official golang 1.23.5 Debian bookworm based docker image.

@sjanel sjanel added the bug Something isn't working label Jan 24, 2025
@RomainMuller
Copy link
Contributor

Hey! Thanks for reporting.

It does look like your build ends up resulting in multiple different versions of the same packages (different build IDs). I expect this is not intended behavior; but since your builds succeeded with v1.0.2, then it most likely is a wrong assumption I had made.

I have an idea for a small enough fix (basically removing this particular assumption), let me give this a shot.

RomainMuller added a commit that referenced this issue Jan 24, 2025
The never-build-twice feature assumed that it is not, ever, possible for
a single build to result in multiple different builds of the same
package with different build IDs. #507 suggest this might actually be a
valid outcome (the exact details of why/how are not clear yet).

This PR removes that assumption and keeps separate cache entries for
each individual build ID in order to reduce the risk of breaking the
build.

Fixes #507
@RomainMuller
Copy link
Contributor

Alright, I have a candidate fix, would you be able to try with:

$ go get github.com/DataDog/orchestrion@b50f449

And then let me know whether this fixes your builds?

@RomainMuller RomainMuller self-assigned this Jan 24, 2025
@sjanel
Copy link
Author

sjanel commented Jan 24, 2025

Yes sure @RomainMuller, let me try!

@sjanel
Copy link
Author

sjanel commented Jan 24, 2025

I am trying your fix, but after 35 minutes, the build seems frozen, and nothing seems to be done (cpu is not doing anything). Maybe there is a deadlock or something ? I will update my message if it finishes, but normally with v1.0.2 it does not take so much time.

@RomainMuller
Copy link
Contributor

Alright yeah I think you are probably correct... There is probably a deadlock and that might be related to how you ended up with multiple builds of the same package (just different build IDs)... It seems like my assumption that this is not "expected behavior" might actually have been correct.

I suspect a child build somehow results in different build configuration being used in v1.0.3 while it did not do so in v1.0.2... I'm going to see if I can set up a similar build to yours to trigger the v1.0.3 bug... That'd make things a lot easier for me to investigate 😅

@sjanel
Copy link
Author

sjanel commented Jan 24, 2025

@RomainMuller
I have created a reproducible example. Just launch foo/build.sh from the archive

foo.tar.gz

@RomainMuller
Copy link
Contributor

RomainMuller commented Jan 24, 2025

I have created a reproducible example. Just launch foo/build.sh from the archive

Haha I managed to get a reproduction on my side about the same time. Yours is a little smaller than mine though 🫣 Thanks nonetheless!


Alright so... this is a little surprising; but it would appear that the issue manifests itself ONLY if your go.mod has a dependency on github.com/DataDog/orchestrion v1.0.2 (1.0.2), but you are installing (and hence, executing) github.com/DataDog/[email protected]. The use of CGO_ENABLED, GOOS and GOARCH does not influence this, and I can reproduce it on macOS (and also Linux, but this is not a surprise).

What happens in this case, is that the -toolexec initially spawns 1.0.3 which detects you're pinned to v1.0.2 and re-spawns this (using go run github.com/DataDog/orchestrion <argv...>).

I have a theory on why this causes an issue which I'm going to test out...

@sjanel
Copy link
Author

sjanel commented Jan 24, 2025

Oh yes nice catch! Indeed the versions downloaded in the Dockerfile and the one in the go.mod mismatch. Previously I had @latest in the Dockerfile go install command, and the v.1.0.2 hardcoded in the go.mod, which will eventually lead to conflicts. I guess this bug report is not one in the end, sorry for the inconvenience !

@RomainMuller
Copy link
Contributor

Well I think there is an issue... Even though there is an "easy" workaround for you here.

Here's the culprit if you're interested (beware, lots of information ahead):

  1. In order to collaborate with the go toolchain's build cache, orchestrion appends information to the outputs of compile -V=full and link -V=full that are called by the toolchain to determine the toolchain versions.
  2. The data that orchestrion appends to these contains:
    • The version of orchestrion itself
    • A hash of the injection configuration
  3. The <tool> -V=full commands are invoked using a working directory that does not allow orchestrion to perform version pinning checks; so the "root" build uses whatever version is running regardless of go.mod (in our case, v1.0.3)
  4. Child builds, however, have a job server available (this would always be the case if you did orchestrion go build instead of go build -toolexec='orchestrion toolexec'), which allows the version check to happen, so these get v1.0.2 and NOT v1.0.3
  5. The result is that child builds use a different logical toolchain, meaning different build IDs, which results in the error you observed (and otherwise could result in a deadlock or infinite loop; or a link-time fingerprint mismatch error).

I reckon this teaches us that we probably shouldn't do the version check + respawn... Maybe instead what we want to do is:

  • If $CI is false-ish, emit a stark warning if the running version of orchestrion isn't the pinned one
  • If $CI is true-ish, turn the above warning into an error (what is running is not what is in your go.mod so your build cannot be guaranteed to be reproductible, which usually is desirable in CI).

We might want to preserve the "auto restart" feature, but only when running in what we call "driver mode", which is when you do orchestrion go ... and NOT when you're using it in "toolexec mode", which is when you specify the -toolexec argument on your own (like you do).

@RomainMuller
Copy link
Contributor

By the way (and I mean to document this somewhere), in your Dockerfile, I would recommend using the following commands instead (from your reproduction):

FROM golang:1.23-bookworm AS base

# 🚮 Don't install a hard-coded orchestrion release...
# RUN go install github.com/DataDog/[email protected]

WORKDIR /bin

WORKDIR /go/src

COPY common ./common

ARG MODULE="bar"

COPY ${MODULE} ./${MODULE}

WORKDIR /go/src/${MODULE}

# Fetch dependencies.
RUN go vet && \
    go get -v

# 🆕 Installs whatever version of orchestrion is in `go.mod`
RUN go install github.com/DataDog/orchestrion

RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -toolexec="orchestrion toolexec" \
    -ldflags='-w -s -extldflags "-static"' -a -o /bin/foo . && \
    DD_TRACE_STARTUP_LOGS="false" go test -toolexec="orchestrion toolexec";

If you prefer to go install github.com/DataDog/orchestrion@latest, then you probably want to go get github.com/DataDog/orchestrion@latest && go install github.com/DataDog/orchestrion, or you perhaps should run orchestrion pin before running your build (this would "upgrade" your go.mod to the currently running orchestrion version if it's newer).

@RomainMuller RomainMuller changed the title [BUG] Build failed with latest version 1.0.3 "sending never-build-twice" [BUG] v1.0.3 builds fail with "sending never-build-twice request: mismatched build ID for ..." Jan 24, 2025
@RomainMuller RomainMuller changed the title [BUG] v1.0.3 builds fail with "sending never-build-twice request: mismatched build ID for ..." [BUG] v1.0.3 builds fail with sending never-build-twice request: mismatched build ID for ... Jan 24, 2025
@RomainMuller RomainMuller changed the title [BUG] v1.0.3 builds fail with sending never-build-twice request: mismatched build ID for ... [BUG] builds fail with sending never-build-twice request: mismatched build ID for ... Jan 24, 2025
@RomainMuller RomainMuller pinned this issue Jan 24, 2025
@RomainMuller
Copy link
Contributor

I've re-titled & pinned this issue as we need to make a decision on how exactly we permanently fix this situation. I believe this isn't specific to v1.0.3 though (just it manifests in a different way on v1.0.3).

@chussenot-believe
Copy link

chussenot-believe commented Jan 24, 2025 via email

RomainMuller added a commit that referenced this issue Jan 31, 2025
The automatic version check incurring a re-spawn of orchestrion with the
`go.mod` version cannot happen in all situations, and this results in
inconsistent build IDs being generated when manually specifying
`-toolexec` (instead of using `orchestrion go ...`).

Since we cannot always do the version check, we cannot re-spawn in all
situations. For now, we decided to no longer re-spawn but instead
produce specific error messages when the running version is not the one
required by `go.mod`, instructing users what to do to fix that.

Fixes #507
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants