Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug: Investigate broken releases #170

Closed
wants to merge 10 commits into from

Conversation

RemiBardon
Copy link
Member

@RemiBardon RemiBardon commented Jan 25, 2025

This PR's sole purpose is to investigate broken release builds (see #160).

It will start by trying to re-built the last successfully built image to ensure we can go from green to red (otherwise we'll be looking in the wrong direction).

See also #169.

@RemiBardon RemiBardon added the fix Fixes a bug label Jan 25, 2025
@RemiBardon RemiBardon self-assigned this Jan 25, 2025
@RemiBardon RemiBardon force-pushed the investigate-broken-release-builds branch from 4eedbd2 to e23ece6 Compare January 25, 2025 23:58
@RemiBardon
Copy link
Member Author

Release & Ship #96 reveals the issue wasn't introduced by us, because the last commit that successfully built (see Release & Ship #79) cannot build anymore.

Once again, a dependency broke our builds and I'm loosing a ton of time trying to find out where it comes from 😠

@RemiBardon
Copy link
Member Author

The issue cannot come from the Docker image we use when building since we've pinned it to a fixed version (lukemathwalker/cargo-chef:0.1.68-rust-alpine) in 658e25d.

@RemiBardon
Copy link
Member Author

From now on, we will use locked versions, to at least have reproducible builds. I already committed what we'd need to change in another PR (see 37a59a6).

@RemiBardon
Copy link
Member Author

I just reapplied some commits to speed up the build but also to build using locked versions. It shouldn't fail since this lock file successfully built when the commit updating it was merged. If dependencies have been updated since then, they will be reverted to their older version.

@RemiBardon
Copy link
Member Author

RemiBardon commented Jan 26, 2025

To identify which dependency caused our issue, I downloaded the logs from Release & Ship #79 (passing) and Release & Ship #80 (broken). I will filter rustc’s logs and isolate every dependency version change. Then, by bumping the versions one by one, it should reveal the problematic crate.

@RemiBardon
Copy link
Member Author

RemiBardon commented Jan 26, 2025

  1. Replace all occurrences of .+(Compiling\s) by $1
  2. Sort lines (case sensitive or not we don't care)
  3. Remove all lines before and after ones starting with Compiling
  4. Remove all occurrences of Compiling\s
  5. Select whole file
  6. Remove duplicate lines

In Bash, this means:

gsed -E 's/.+(Compiling\s)/\1/' "$FILE" | sort | awk '/^Compiling /{print}' | gsed 's/Compiling\s//g' | uniq

Then, do the same with the second file.

diff ship-docker-image-79-filtered.txt ship-docker-image-80-filtered.txt
137a138
> unicode-ident v1.0.15

@RemiBardon
Copy link
Member Author

Release 1.0.15 · dtolnay/unicode-ident was published 3 days ago, seems like a good lead.

@RemiBardon
Copy link
Member Author

I made myslef a little script (that I will share soon), but regarding last fail, here is the diff:

./diff.sh 79 97
0a1,2
> addr2line v0.24.2
> adler2 v2.0.0
1a4,5
> aliasable v0.1.3
> allocator-api2 v0.2.21
3d6
< async-stream v0.3.6
5,14c8,22
< async-trait v0.1.85
< aws-lc-rs v1.12.2
< aws-lc-sys v0.25.0
< axum v0.8.1
< axum-core v0.5.0
< axum-extra v0.10.0
< axum-macros v0.5.0
< backtrace v0.3.74
< bitflags v2.8.0
< cc v1.2.10
---
> async-trait v0.1.84
> atoi v2.0.0
> atomic-waker v1.1.2
> autocfg v1.4.0
> aws-lc-rs v1.12.0
> aws-lc-sys v0.24.1
> base64 v0.22.1
> beef v0.5.2
> blake2 v0.10.6
> block-buffer v0.10.4
> byteorder v1.5.0
> bytes v1.9.0
> castaway v0.2.3
> cc v1.2.7
> cfg-if v1.0.0
16d23
< chumsky v0.9.3
17a25,32
> compact_str v0.7.1
> concurrent-queue v2.5.0
> cpufeatures v0.2.16
> crc v3.2.1
> crc-catalog v2.4.0
> crossbeam-queue v0.3.12
> crossbeam-utils v0.8.21
> crypto-common v0.1.6
21c36,38
< data-encoding v2.7.0
---
> data-encoding v2.6.0
> deranged v0.3.11
> digest v0.10.7
22a40
> dunce v1.0.5
24d41
< email_address v0.2.9
26,27c43,49
< event-listener v5.4.0
< figment v0.10.19
---
> equivalent v1.0.1
> event-listener v5.3.1
> flume v0.11.1
> fnv v1.0.7
> foldhash v0.1.4
> form_urlencoded v1.2.1
> fs_extra v1.3.0
28a51,52
> futures-channel v0.3.31
> futures-core v0.3.31
29a54,55
> futures-intrusive v0.5.0
> futures-io v0.3.31
30a57,58
> futures-sink v0.3.31
> futures-task v0.3.31
32c60,62
< generic-array v1.2.0
---
> generic-array v0.14.7
> getrandom v0.2.15
> gimli v0.31.1
34c64,67
< hashbrown v0.14.5
---
> hashbrown v0.15.2
> hashlink v0.10.0
> heck v0.4.1
> heck v0.5.0
36c69,72
< hickory-resolver v0.24.2
---
> hmac v0.12.1
> hostname v0.3.1
> http v1.2.0
> http-body v1.0.1
37a74,75
> httparse v1.9.5
> httpdate v1.0.3
39d76
< hyper-rustls v0.27.5
40a78
> iana-time-zone v0.1.61
43a82
> icu_locid_transform_data v1.5.0
44a84
> icu_normalizer_data v1.5.0
45a86
> icu_properties_data v1.5.0
47a89,90
> ident_case v1.0.1
> idna v0.1.5
50c93
< indexmap v2.7.1
---
> indexmap v2.7.0
52,54c95,97
< ipnet v2.11.0
< iso8601-duration v0.2.0
< iso8601-timestamp v0.3.3
---
> ipnet v2.10.1
> itertools v0.12.1
> itoa v1.0.14
56c99,102
< lettre v0.11.11
---
> jobserver v0.1.32
> keccak v0.1.5
> lazy_static v1.5.0
> libc v0.2.169
58,60c104,107
< linked_hash_set v0.1.5
< log v0.4.25
< logos v0.15.0
---
> linked-hash-map v0.5.6
> litemap v0.7.4
> lock_api v0.4.12
> log v0.4.22
62c109,113
< logos-derive v0.15.0
---
> lru-cache v0.1.2
> match_cfg v0.1.0
> matches v0.1.10
> memchr v2.7.4
> mime v0.3.17
64,69c115,134
< miniz_oxide v0.8.3
< ouroboros v0.18.5
< ouroboros_macro v0.18.5
< pear v0.2.9
< pear_codegen v0.2.9
< pin-project-lite v0.2.16
---
> minimal-lexical v0.2.1
> miniz_oxide v0.8.2
> mio v1.0.3
> num-conv v0.1.0
> num-traits v0.2.19
> object v0.36.7
> once_cell v1.20.2
> ordered-float v3.9.2
> ouroboros_macro v0.18.4
> overload v0.1.1
> parking v2.2.1
> parking_lot v0.12.3
> parking_lot_core v0.9.10
> paste v1.0.15
> pbkdf2 v0.12.2
> percent-encoding v2.3.1
> pin-project-lite v0.2.15
> pin-utils v0.1.0
> pkg-config v0.3.31
> powerfmt v0.2.0
73c138
< proc-macro2 v1.0.93
---
> proc-macro2 v1.0.92
75,79d139
< prose-pod-api v0.7.0 (/usr/src/prose-pod-api/crates/rest-api)
< prose-proc-macros v0.1.0 (https://github.com/prose-im/prose-core-client.git?tag=0.1.99#ca865789)
< prose-wasm-utils v0.1.0 (https://github.com/prose-im/prose-core-client.git?tag=0.1.99#ca865789)
< prose-xmpp v0.1.0 (https://github.com/prose-im/prose-core-client.git?tag=0.1.99#ca865789)
< prosody-config v0.1.0 (/usr/src/prose-pod-api/crates/prosody-config)
80a141
> quick-error v1.2.3
84c145,150
< reqwest v0.12.12
---
> rand_core v0.6.4
> regex-automata v0.1.10
> regex-automata v0.4.9
> regex-syntax v0.6.29
> regex-syntax v0.8.5
> resolv-conf v0.7.0
85a152
> rustc-demangle v0.1.24
87,88c154,157
< rustls v0.23.21
< rustls-webpki v0.102.8
---
> rustls v0.23.20
> rustls-pemfile v2.2.0
> rustls-pki-types v1.10.1
> rustversion v1.0.19
89a159,161
> rxml_validation v0.11.0
> ryu v1.0.18
> scopeguard v1.2.0
91,93d162
< sea-orm v1.1.4
< sea-orm-macros v1.1.4
< sea-orm-migration v1.1.4
95d163
< sea-query-binder v0.7.0
97,100c165
< sea-schema v0.16.1
< sea-schema-derive v0.3.0
< secrecy v0.8.0
< semver v1.0.25
---
> semver v1.0.24
103,105c168
< serde_html_form v0.2.7
< serde_json v1.0.137
< serde_path_to_error v0.1.16
---
> serde_json v1.0.134
108,113c171,180
< serde_with v3.12.0
< serde_with_macros v3.12.0
< service v0.7.0 (/usr/src/prose-pod-api/crates/service)
< sqlx v0.8.3
< sqlx-core v0.8.3
< sqlx-sqlite v0.8.3
---
> sha1 v0.10.6
> sha2 v0.10.8
> sha3 v0.10.8
> shlex v1.3.0
> signal-hook-registry v1.4.2
> slab v0.4.9
> smallvec v1.13.2
> socket2 v0.5.8
> spin v0.9.8
> stable_deref_trait v1.2.0
114a182,184
> static_assertions v1.1.0
> stringprep v0.1.5
> strsim v0.11.1
117c187,189
< syn v2.0.96
---
> subtle v2.6.1
> syn v2.0.94
> sync_wrapper v1.0.2
120c192
< thiserror v2.0.11
---
> thiserror v2.0.9
122c194,197
< thiserror-impl v2.0.11
---
> thiserror-impl v2.0.9
> time v0.3.37
> time-core v0.1.2
> time-macros v0.2.19
123a199,200
> tinyvec v1.8.1
> tinyvec_macros v0.1.1
126d202
< tokio-rustls v0.26.1
129,130d204
< tokio-xmpp v4.0.0
< toml v0.8.19
132d205
< toml_edit v0.22.22
134c207,208
< tower-http v0.5.2
---
> tower-layer v0.3.3
> tower-service v0.3.3
137c211,221
< tracing-subscriber v0.3.19
---
> tracing-core v0.1.33
> try-lock v0.2.5
> typenum v1.17.0
> uncased v0.9.10
> unicase v2.8.1
> unicode-bidi v0.3.18
> unicode-ident v1.0.14
> unicode-normalization v0.1.24
> unicode-properties v0.1.3
> unicode-segmentation v1.12.0
> untrusted v0.9.0
139,144c223,233
< url_serde v0.2.0
< uuid v1.12.1
< uuid-macro-internal v1.12.1
< vcard4 v0.7.1
< winnow v0.6.24
< xmpp-parsers v0.21.0
---
> utf16_iter v1.0.5
> utf8_iter v1.0.4
> uuid v1.11.0
> uuid-macro-internal v1.11.0
> vcpkg v0.2.15
> version_check v0.9.5
> want v0.3.1
> webpki-roots v0.26.7
> winnow v0.6.22
> write16 v1.0.0
> writeable v0.5.5
146a236
> yansi v1.0.1

We should force pin versions I suppose

@RemiBardon
Copy link
Member Author

Last fail took 19 minutes, I am trying to replicate the issue in tests, to have faster fails. Hopefully the issue comes from ARM64 and not release builds in particular.

@RemiBardon
Copy link
Member Author

Unfortunately the issue is only on release builds on ARM…

@RemiBardon RemiBardon force-pushed the investigate-broken-release-builds branch from 2e71382 to d1dec3a Compare January 26, 2025 13:32
@RemiBardon
Copy link
Member Author

RemiBardon commented Jan 26, 2025

Build passed in 4 minutes on ubuntu-24.04-arm (see https://github.com/prose-im/prose-pod-api/pull/170/commits/d1dec3a5dca6f3ee8a71986b99b96173a0664534)… this means the issue is with Docker buildx?

I'm going to retry building on ubuntu-latest just to make sure.

@RemiBardon
Copy link
Member Author

If the issue rises again with last commit, I will try building the image for both amd and arm on arm. Release builds might even be faster (see Linux arm64 hosted runners now available for free in public repositories (Public Preview) 🎉 · community · Discussion #148648)?

@RemiBardon
Copy link
Member Author

I just stumbled upon arm64: g++: internal compiler error: Segmentation fault signal terminated program cc1plus (#16864) · Tickets · alpine / aports · GitLab. The answer suggests it's caused by the update of GitHub’s ubuntu-latest runners from Ubuntu 22.04 to 24.04. Downgrading to Ubuntu 22.04 should fix it (see Ubuntu-latest workflows will use Ubuntu-24.04 image · Issue #10636 · actions/runner-images). I'm testing it.

@RemiBardon
Copy link
Member Author

The release build succeeded on ubuntu-22.04 on both AMD and ARM. That was it. I'm effing pissed.

RemiBardon added a commit that referenced this pull request Jan 26, 2025
See [debug: Investigate broken releases](#170)
and [Ubuntu-latest workflows will use Ubuntu-24.04 image · Issue #10636 · actions/runner-images](actions/runner-images#10636 (comment)).
@RemiBardon
Copy link
Member Author

Closing this since we found and fixed the cause.

@RemiBardon RemiBardon closed this Jan 26, 2025
@RemiBardon RemiBardon deleted the investigate-broken-release-builds branch January 26, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fixes a bug
Projects
Status: Done ✅
Development

Successfully merging this pull request may close these issues.

1 participant