-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests define custom cluster replica sizes #30459
Tests define custom cluster replica sizes #30459
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nightly run triggered, I'd suggest waiting for it: https://buildkite.com/materialize/nightly/builds/10370
d7128b3
to
084b028
Compare
I've capitulated trying to work with both the default value and the value we'd like to use in tests. There were too many places that would rely on the default value and hence cause hard-to-debug test failures. Some code constructs a default replica size map and then it'll get passed around a bunch, introducing a gap between introducing the problem and an error actually happening. Instead, I think we should never rely on the default encoded in environmentd, and always pass a replica size map. This makes it significantly cleaner as to what values are expected, but it causes a bunch of code changes. I updated the PR to that effect, let's see what tests say. |
If you want I can also take over cleaning up the test failures. |
d4ae851
to
6183279
Compare
Another nightly run: https://buildkite.com/materialize/nightly/builds/10390 |
@@ -775,7 +778,7 @@ def run( | |||
return self.invoke( | |||
"run", | |||
*(["--entrypoint", entrypoint] if entrypoint else []), | |||
*(f"-e{k}={v}" for k, v in env_extra.items()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider printing env_extra
to make it easier to debug. Including the values in the command spams the terminal, so I changed it to just pass the keys here, and pass the environment separately.
impl Default for ClusterReplicaSizeMap { | ||
// Used for testing and local purposes. This default value should not be used in production. | ||
// | ||
// Credits per hour are calculated as being equal to scale. This is not necessarily how the | ||
// value is computed in production. | ||
fn default() -> Self { | ||
/// Used for testing and local purposes. This default value should not be used in production. | ||
/// | ||
/// Credits per hour are calculated as being equal to scale. This is not necessarily how the | ||
/// value is computed in production. | ||
pub fn for_tests() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is load-bearing because we remove the default implementation. Code that depends on a value need to use for_tests
, which should make it sufficiently clear that this value isn't to be used in many situations.
def bootstrap_cluster_replica_size() -> str: | ||
return "bootstrap" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using the size bootstrap
for all replicas created with the bootstrap size. It's defined to be equivalent to the size 1. This allows us to detect situations when the builtin replica size map is used instead, because it doesn't define the bootstrap
replica size.
This probably breaks the emulator! I'll have a change for that in a jiffy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test part lgtm, nightly will need a bunch of cloudtest fixes too. See misc/python/materialize/cloudtest/k8s/environmentd.py
I'm out for the rest of they day, so if you @def- want to take over, I'd be happy! |
b9969b1
to
ef1cfd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Moritz Hoffmann <[email protected]> fixup Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
ef1cfd9
to
de02135
Compare
I have fixed up the remaining cloudtest failure. |
Tests define their cluster replica sizes without relying on the builtin defaults.
Motivation
This allows us to decouple the builtin sizes in the materialize binary to differ from what tests require. It's a step towards a better emulator experience.
This PR turned out bigger than expected because we'd use the builtin defaults in places where we shouldn't. Tracing the places is hard, often because the place where the defaults are used is separated from where we encounter potential inconsistencies. To avoid the issues going forward, this PR changes the way we handle default cluster replica sizes: It's not an optional parameter anymore but all callers need to provide it to environmentd and testdrive. Only sqllogictest still uses the default.
The PR adjusts all entrypoints within the materialize repo to supply a cluster replica size map, and all uses outside the repo should do so already.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.