Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist: disable compaction in read-only mode #30725

Conversation

ParkMyCar
Copy link
Member

This PR disables Persist's compaction when environmentd or clusterd are in read-only mode. As part of #30205 we discovered that during a 0dt deployment the read-only instance of Materialize was scheduling compaction requests which was causing it to write data.

We disable compaction by adding a process_compaction_requests: Arc<AtomicBool> to the PersistConfig, and when setting it to true when clusterd receives the already existing AllowWrites command. Also included is a CYA dyncfg that gates whether or not we check the flag, it's set to true by default.

I also added a Prometheus metric to count the number of requests dropped because compaction is disabled, a unit test to exercise the basic enable/disable behavior, and manually checked when running a 0dt test that compaction was successfully disabled and then enabled across environmentd and all clusterds when a deployment was promoted.

Motivation

Fix issue found in #30205

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

…ly mode

* add an Arc<AtomicBool> to PersistConfig that determines whether or not we process a compaction request
* add a dyncfg to CYA that determines whether or not we check the Arc<AtomicBool>
* add a test
@ParkMyCar ParkMyCar requested review from a team as code owners December 4, 2024 17:47
@ParkMyCar ParkMyCar requested review from jkosh44 and bkirwi December 4, 2024 17:47
Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Deferring the real review to @bkirwi but no qualms from me.

Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Made a suggestion on the doc to make it more clear what the default behaviour will be when the dyncfg is disabled.

@ParkMyCar ParkMyCar enabled auto-merge (squash) December 4, 2024 20:24
@ParkMyCar
Copy link
Member Author

TFTR!

@ParkMyCar ParkMyCar merged commit d101d9e into MaterializeInc:main Dec 4, 2024
79 checks passed
ParkMyCar added a commit that referenced this pull request Dec 9, 2024
Requires #30725 to be
merged.

This PR stabilizes Schema Evolution in Persist which unblocks `ALTER
TABLE` work. There are a few changes in this one PR, they are all geared
around handling the slight instability we have with the nullability of
columns in Materialized Views.

1. Internally in Persist, all columns are marked as nullable at the
Arrow/Parquet level.
2. A one-time migration of durably persisted `arrow::DataType`s in
Persist's Schema Registery that allows them to be more nullable to
account for the changes from [1].
3. Deprecation of the existing `SchemaId`s in `Part` and `Run` metadata.
We do this by renaming the existing `schema_id` fields to
`deprecated_schema_id` and introducing a `schema_id` field with a new
tag. Also a dyncfg is added so we can turn off writing to the new
`schema_id` field until we're confident all nodes have rolled to a new
version.
4. During bootstrapping of the Coordinator, if the nullability of
columns for a MatView have changed, we compare and evolve the new schema
in Persist.
5. Changed the upgrade check in `catalog-debug` to validate that the
`RelationDesc`s as planned by the new version of MZ are compatible with
the ones durably recorded in Persist.

Also included in this PR are two feature gate changes to disable new
features in our tests until > `v0.126`.

1. Builtin Continual Tasks
2. 0dt Enabled Sources

Both of these features cause the new version of MZ to issue writes
during a 0dt upgrade when it's supposed to be in read-only mode. Because
this PR changes the Arrow datatypes in a non-forward compatible way,
this causes the 0dt tests to fail since the old version panics when it
sees the new batches.

### Motivation

Fixes MaterializeInc/database-issues#8660

### Tips for reviewer

All of these changes have been made in separate commits of the PR which
ideally makes reviewing easier.

### Checklist

- [x] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [x] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [x] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [x] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [x] If this PR includes major [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note),
I have pinged the relevant PM to schedule a changelog post.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants