storage: allow kafka/loadgen sources on multi-replica clusters, gated by dyncfg #31227

aljoscha · 2025-01-29T14:08:17Z

Version of #30003 with a dyncfg for enabling/disabling multi-replica sources

Motivation

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

ParkMyCar

Woohoo!

def-

Test changes lgtm, I triggered a nightly run: https://buildkite.com/materialize/nightly/builds/11070

def- · 2025-02-06T19:04:59Z

I'm a bit confused by the data ingest error: https://buildkite.com/materialize/nightly/builds/11070#0194dc61-63a3-4e71-b78a-7dad5aa1bac1

psycopg.errors.InternalError_: cannot create source in cluster with more than one replica

I thought that shouldn't happen anymore since we enable multi-cluster replication now?

aljoscha · 2025-02-10T10:31:14Z

@def- This only enables multi-replica sources for Kafka sources, the other sources are still an open TODO after this. And yeah, sorry because the issue title is confusing.

Fwiw, here's the overall EPIC: https://github.com/MaterializeInc/database-issues/issues/5051

def-

Another nightly run: https://buildkite.com/materialize/nightly/builds/11102 Edit: I'm fixing up data ingest Edit2: New run just with data ingest: https://buildkite.com/materialize/nightly/builds/11104

I also want to write some more tests, but can do it after this is merged too.

def- · 2025-02-10T22:02:57Z

Something interesting seems to have happened in the data ingest test: https://buildkite.com/materialize/nightly/builds/11104#_

data-ingest-materialized-1     | 2025-02-10T18:41:59.892749Z  thread 'main' panicked at src/adapter/src/catalog/apply.rs:1024:33: PlanError(Unstructured("cannot create source in cluster with more than one replica")): invalid persisted SQL: CREATE SOURCE "materialize"."public"."kafka_table0" IN CLUSTER [u2] FROM KAFKA CONNECTION [u1 AS "materialize"."public"."kafka_conn"] (TOPIC = 'data-ingest-0') FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION [u2 AS "materialize"."public"."csr_conn"] SEED KEY SCHEMA '{"type":"record","name":"key","fields":[{"name":"key0","type":"long"},{"name":"key1","type":"int"}]}' VALUE SCHEMA '{"type":"record","name":"value","fields":[{"name":"value0","type":"int"},{"name":"value1","type":"double"},{"name":"value2","type":"double"},{"name":"value3","type":"string"},{"name":"value4","type":"double"},{"name":"value5","type":"int"},{"name":"value6","type":"double"},{"name":"value7","type":"int"},{"name":"value8","type":"double"}]}' ENVELOPE UPSERT EXPOSE PROGRESS AS [u17 AS "materialize"."public"."kafka_table0_progress"]

teskje

Looks mostly good, but I'm suspicious that the active_copies tracking might not always be correct. At least I don't understand why we can unconditionally initialize the number of active copies to 1 when a collection is created, and I think we can leak CollectionState when a replica is dropped at the wrong moment.

teskje · 2025-02-11T11:12:08Z

src/storage-controller/src/lib.rs

-                } else {
-                    soft_panic_or_log!(
-                        "DroppedId for ID {id} but we have neither ingestion nor export \
-                         under that ID"
-                    );


I think that soft panic should stay? Since we're doing the active_copies tracking there isn't a reason we should ever get here, right?

src/storage/src/source/reclock.rs

teskje · 2025-02-11T11:20:16Z

src/storage-controller/src/lib.rs

+        for id in instance.active_ingestions() {
+            self.collections
+                .get_mut(id)
+                .expect("instance contains unknown ingestion")


Can we make this a soft panic instead? I'm wary of panics that could bring down envd, if it's not immediately obvious that we can't hit them. And I don't think that's obvious here.

teskje · 2025-02-11T11:25:15Z

src/storage-controller/src/lib.rs

@@ -948,6 +963,7 @@ where
                data_source,
                collection_metadata: metadata,
                extra_state,
+                active_copies: 1,


How do we know that there is one active copy when a collection is created? Don't we need to look at the number of replicas for that?

If this ends up being an ingestion, then run_ingestions sets the active_copies to the number of instance replicas. Otherwise it is set to one for tables.

I see. Would be great to have that context in a comment too!

teskje · 2025-02-11T11:33:57Z

src/storage-controller/src/lib.rs

+            self.collections
+                .get_mut(id)
+                .expect("instance contains unknown ingestion")
+                .active_copies -= 1;


We might have to remove collection state if this value drops to zero. For example, consider the case where a source is dropped but before we receive the DroppedId response from the replica we drop the replica. With the current code I think we would just leak the CollectionState for the dropped collection.

This is tricky because we can't always drop a collection when this counter goes to zero. All replicas in a cluster can be dropped without the sources on the same cluster being dropped as well.

aljoscha · 2025-02-11T11:56:38Z

Looks mostly good, but I'm suspicious that the active_copies tracking might not always be correct. At least I don't understand why we can unconditionally initialize the number of active copies to 1 when a collection is created, and I think we can leak CollectionState when a replica is dropped at the wrong moment.

@petrosagg Could you please look at the questions about active_copies and the changes to reclock (basically all the questions above 😅)

aljoscha · 2025-02-11T12:25:39Z

@teskje and @petrosagg For some of the trickier questions around collection state, active_copies, concurrency, assertions, and leaking state, I think you already know my general stance 😅

aljoscha · 2025-02-13T13:44:39Z

@def- the nightly is failing with:

data-ingest-materialized-1     | 2025-02-10T18:41:59.892749Z  thread 'main' panicked at src/adapter/src/catalog/apply.rs:1024:33: PlanError(Unstructured("cannot create source in cluster with more than one replica")): invalid persisted SQL: CREATE SOURCE "materialize"."public"."kafka_table0" IN CLUSTER [u2] FROM KAFKA CONNECTION [u1 AS "materialize"."public"."kafka_conn"] (TOPIC = 'data-ingest-0') FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION [u2 AS "materialize"."public"."csr_conn"] SEED KEY SCHEMA '{"type":"record","name":"key","fields":[{"name":"key0","type":"long"},{"name":"key1","type":"int"}]}' VALUE SCHEMA '{"type":"record","name":"value","fields":[{"name":"value0","type":"int"},{"name":"value1","type":"double"},{"name":"value2","type":"double"},{"name":"value3","type":"string"},{"name":"value4","type":"double"},{"name":"value5","type":"int"},{"name":"value6","type":"double"},{"name":"value7","type":"int"},{"name":"value8","type":"double"}]}' ENVELOPE UPSERT EXPOSE PROGRESS AS [u17 AS "materialize"."public"."kafka_table0_progress"]

Which happens when enable_multi_replica_sources=on is disabled but has been enabled before and there are multi-replica clusters with sources. We currently can't go back, because we would have to delete either replicas or sources to get back to a valid state with the flag disabled.

aljoscha · 2025-02-14T10:16:24Z

Rebased and fixed, @def- we still need to get to the bottom of ☝️ , are the tests flipping the dyncfg on and off?

def- · 2025-02-14T17:19:45Z

No, they are not. The dyncfg should always be on in the Data Ingest test.

def- · 2025-02-14T17:28:03Z

What might happen is that during bootstrapping we parse the SQL without respecting the dyncfg setting?

aljoscha · 2025-02-17T12:43:05Z

What might happen is that during bootstrapping we parse the SQL without respecting the dyncfg setting?

Good catch! I pushed a commit that should fix this.

We want the mint operation to be a no-op if either new_from_upper or binding_ts are not beyond the current source_upper and upper respectively. Before, we only checked the former which meant that if the requested binding_ts was in the past we would attempt to write to the remap shard updates that are not beyond the upper. Co-authored-by: Petros Angelatos <[email protected]> Signed-off-by: Petros Angelatos <[email protected]>

Signed-off-by: Petros Angelatos <[email protected]>

This will lead to panics when bootstrapping from a catalog that has these allowed. Largely, it seems, because we don't yet have the proper dyncfg value that would allow this. We still have checks that disallow sources on multi-replica clusters at the sequencer level, so should be fine.

aljoscha · 2025-02-17T14:52:54Z

@def- what about that remaining cloudtest failure here: https://buildkite.com/materialize/nightly/builds/11180?

def- · 2025-02-17T15:22:43Z

I have pushed a fix: 2e0132c @jubrad does this make sense?

aljoscha · 2025-02-17T15:36:55Z

I have pushed a fix: 2e0132c @jubrad does this make sense?

doh! tyty! 🙇‍♂️

aljoscha · 2025-02-17T16:05:57Z

@def- I pushed a2fe527 because this was yielding

error: expected error containing "creating cluster replica would violate max_replicas_per_cluster limit", got "unknown cluster replica size 2000"

aljoscha · 2025-02-17T16:20:11Z

@def- I think you pushed a commit that changes the expected error message, on top of mine which changes size to replication factor 😅

def- · 2025-02-17T16:20:23Z

Oops, didn't see your comment and just pushed something similar, sorry!

aljoscha · 2025-02-17T16:35:34Z

Oops, didn't see your comment and just pushed something similar, sorry!

No worries at all! ☺️

…licas Follow-up to MaterializeInc#31227

aljoscha requested review from a team as code owners January 29, 2025 14:08

aljoscha requested a review from ParkMyCar January 29, 2025 14:08

aljoscha force-pushed the active-replication-with-cfg branch 2 times, most recently from f8c49f7 to 3e761d9 Compare February 3, 2025 16:06

ParkMyCar approved these changes Feb 6, 2025

View reviewed changes

def- reviewed Feb 6, 2025

View reviewed changes

aljoscha changed the title ~~storage: enable source replication, make it configurable~~ storage: allow kafka/loadgen sources on multi-replica clusters, gated by dyncfg Feb 10, 2025

aljoscha force-pushed the active-replication-with-cfg branch from 3e761d9 to c1cf70e Compare February 10, 2025 11:30

def- approved these changes Feb 10, 2025

View reviewed changes

def- force-pushed the active-replication-with-cfg branch from 8265d81 to eae3e49 Compare February 10, 2025 18:03

teskje reviewed Feb 11, 2025

View reviewed changes

aljoscha mentioned this pull request Feb 13, 2025

storage: allow singleton sources on multi-replica clusters #31479

Closed

aljoscha force-pushed the active-replication-with-cfg branch from eae3e49 to 7f1aabf Compare February 14, 2025 10:15

aljoscha force-pushed the active-replication-with-cfg branch from 7f1aabf to 0882f9c Compare February 17, 2025 12:42

petrosagg and others added 4 commits February 17, 2025 14:00

storage: allow kafka/loadgen sources on multi-replica clusters

32d5de3

Signed-off-by: Petros Angelatos <[email protected]>

storage: track active copies of ingestions

049bfeb

adapter: add dyncfg for configuring multi-replica source clusters

4a64eb2

aljoscha and others added 4 commits February 17, 2025 14:00

tests: set enable_multi_replica_sources = true in ci

dfdb17b

data-ingest test: Only use multi-replica cluster for Kafka

b6f4a95

storage: add explanatory comment around active_copies

f87b0cb

aljoscha force-pushed the active-replication-with-cfg branch from 0882f9c to 7ed362c Compare February 17, 2025 13:01

cloudtest: Switch test_graceful_reconfiguration to a different error

a2fe527

aljoscha force-pushed the active-replication-with-cfg branch from 2e0132c to a2fe527 Compare February 17, 2025 16:04

def- force-pushed the active-replication-with-cfg branch from a2fe527 to 08db2f5 Compare February 17, 2025 16:06

def- force-pushed the active-replication-with-cfg branch from 08db2f5 to a2fe527 Compare February 17, 2025 16:26

aljoscha merged commit 20fabdd into MaterializeInc:main Feb 17, 2025
237 of 241 checks passed

aljoscha deleted the active-replication-with-cfg branch February 17, 2025 20:46

morsapaes mentioned this pull request Feb 18, 2025

enable source replication for kafka sources #30003

Closed

5 tasks

def- added a commit to def-/materialize that referenced this pull request Feb 19, 2025

platform-checks: Add basic check for load generator with multiple rep…

25834a5

…licas Follow-up to MaterializeInc#31227

def- mentioned this pull request Feb 19, 2025

platform-checks: Add basic check for load generator with multiple replicas #31554

Merged

5 tasks

storage: allow kafka/loadgen sources on multi-replica clusters, gated by dyncfg #31227

storage: allow kafka/loadgen sources on multi-replica clusters, gated by dyncfg #31227

Uh oh!

Conversation

aljoscha commented Jan 29, 2025

Motivation

Tips for reviewer

Checklist

Uh oh!

ParkMyCar left a comment

Choose a reason for hiding this comment

Uh oh!

def- left a comment

Choose a reason for hiding this comment

Uh oh!

def- commented Feb 6, 2025

Uh oh!

aljoscha commented Feb 10, 2025

Uh oh!

def- left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

def- commented Feb 10, 2025

Uh oh!

teskje left a comment

Choose a reason for hiding this comment

Uh oh!

teskje Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

teskje Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

petrosagg Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

aljoscha commented Feb 11, 2025

Uh oh!

aljoscha commented Feb 11, 2025

Uh oh!

aljoscha commented Feb 13, 2025

Uh oh!

aljoscha commented Feb 14, 2025

Uh oh!

def- commented Feb 14, 2025

Uh oh!

def- commented Feb 14, 2025

Uh oh!

aljoscha commented Feb 17, 2025

Uh oh!

aljoscha commented Feb 17, 2025

Uh oh!

def- commented Feb 17, 2025

Uh oh!

aljoscha commented Feb 17, 2025

Uh oh!

aljoscha commented Feb 17, 2025

Uh oh!

aljoscha commented Feb 17, 2025

Uh oh!

def- commented Feb 17, 2025

Uh oh!

aljoscha commented Feb 17, 2025

Uh oh!

Uh oh!

Uh oh!

def- left a comment •

edited

Loading