Skip to content

Conversation

@chrisstaite
Copy link
Contributor

@chrisstaite chrisstaite commented Feb 8, 2025

Description

There are multiple use cases where we don't want a fast-slow store to persist to one of the stores in some direction. For example, worker nodes do not want to store build results on the local filesystem, just with the upstream CAS. Another case would be the re-use of prod action cache in a dev environment, but not vice-versa.

This PR introduces options to the fast-slow store which default to the existing behaviour, but allows customisation of each side of the fast slow store to either persist in the case or get operations, put operations or to make them read only.

Fixes #1577

Type of change

Please delete options that aren't relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Added new tests

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Feb 8, 2025

CLA assistant check
All committers have signed the CLA.

@MarcusSorealheis MarcusSorealheis requested review from aaronmondal and allada and removed request for aaronmondal February 17, 2025 22:23
@MarcusSorealheis MarcusSorealheis removed the request for review from allada April 20, 2025 00:33
@MarcusSorealheis
Copy link
Collaborator

Hi @chrisstaite, not sure if I screwed anything up here yet, but I did just fix the merge conflicts to make it easier for @aaronmondal to review the PR and get it merged. It's a powerful new feature. I did this from the web browser so I may have screwed something up.

It's clear that the PR needs:

  1. Rust format
  2. and probably a sanity check against all the new linting that was added.

There are a couple things missing in the PR that I'd like to see that are probably separate from what Aaron's going to ask about:

One, an example of how to use it. A short guide would be awesome. #1577 makes it obvious why someone would want this feature (e.g., flow from Prod to Dev but not vice versa), but for other people to use it, would be great for us to document it. Aaron and I can also help there if time-constraints are an issue. We are simply trying to graduate Nativelink from the era of if users want to use it they need to look at the code, to opening up to more users.

This is really a fantastic feature. Thanks for the contribution!

@MarcusSorealheis
Copy link
Collaborator

i'll do the whole amend if my changes fixed the rust formatting issues and I think it wil.

There are multiple use cases where we don't want a fast-slow store to
persist to one of the stores in some direction.  For example, worker
nodes do not want to store build results on the local filesystem, just
with the upstream CAS.  Another case would be the re-use of prod action
cache in a dev environment, but not vice-versa.

This PR introduces options to the fast-slow store which default to the
existing behaviour, but allows customisation of each side of the fast
slow store to either persist in the case or get operations, put
operations or to make them read only.

Fixes TraceMachina#1577
@chrisstaite
Copy link
Contributor Author

@MarcusSorealheis I've brought this back up to date.

@chrisstaite-menlo chrisstaite-menlo requested review from palfrey and removed request for aaronmondal October 17, 2025 16:06
@chrisstaite-menlo chrisstaite-menlo enabled auto-merge (squash) October 21, 2025 17:17
@chrisstaite
Copy link
Contributor Author

Would be nice to get this in to the next release as it opens a lot of opportunity for cluster cost savings to avoid extra GCS operations.

@MarcusSorealheis
Copy link
Collaborator

@chrisstaite I've just updated the branch. Will try it out.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is helpful. One question, though. If both directions are ReadOnly and a put arrives, it drains but returns Ok(())—is this correct? It discards data silently. Maybe log a warn! or error if unintended.

I don't want to fail totally silently.

@chrisstaite-menlo chrisstaite-menlo merged commit 6d867c9 into TraceMachina:main Oct 28, 2025
32 of 33 checks passed
@MarcusSorealheis
Copy link
Collaborator

@amankrx do you have any thoughts here?

@amankrx
Copy link
Collaborator

amankrx commented Oct 28, 2025

I'll test out the branch, but the CI is failing right now. Maybe, we should by investigating the cause here.

@MarcusSorealheis
Copy link
Collaborator

@amankrx CI did not fail here did it?

@amankrx
Copy link
Collaborator

amankrx commented Oct 28, 2025

It failed after it was merged to main. Might not be related to this PR directly, but we should investigate why we get these errors: https://github.com/TraceMachina/nativelink/actions/runs/18865879330/job/53833330211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a read-write store

5 participants