Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make data stream lifecycle project-aware #125476

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

nielsbauman
Copy link
Contributor

Now that all actions that DLM depends on are project-aware, we can make DLM itself project-aware.
There still exists only one instance of DataStreamLifecycleService, it just loops over all the projects - which matches the approach we've taken for similar scenarios thus far.

@nielsbauman nielsbauman added >non-issue :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team v9.1.0 labels Mar 24, 2025
@nielsbauman nielsbauman requested a review from a team March 24, 2025 08:10
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@nielsbauman
Copy link
Contributor Author

FTR, DLM mostly relies on internal cluster tests. Currently, we don't have a way to run internal cluster tests in MP mode. I made some local hacks to be able to run the DLM internal cluster tests run in MP mode anyway and they all passed. Those hacks aren't for committing purposes, but the MP team is working with ES Delivery to make proper changes to the testing infrastructure to allow running all kinds of tests in MP mode.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments

try {
projectResolver.executeOnProject(projectId, () -> run(state.projectState(projectId)));
} catch (Exception e) {
logger.error(Strings.format("Data stream lifecycle failed to run on project [%s]", projectId), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worthwhile to use ExceptionsHelper#useOrSuppress and throw the final exception (if any) which seems to be matching the existing behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of run(ProjectState projectState) already catches a few generic exceptions, so to me it seemed like we don't really expect any exception to be thrown in there. As in, no code that calls the run(ClusterState state) has any specific code for handling exceptions. So I don't think it would make much of a difference here.

for (var projectId : state.metadata().projects().keySet()) {
// We catch inside the loop to avoid one broken project preventing DLM to run on other projects.
try {
projectResolver.executeOnProject(projectId, () -> run(state.projectState(projectId)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we also pass projectId to all the downstream methods, I wonder whether the effect of using executeOnProject is a bit too far away for understanding the code, i.e. it was not clear to me that the later client.execute is going to do the right thing. An alternative would be using executeOnProject on the spots where client is invoked. We could have a method that returns dedicated client similar to the following

Client projectClient(ProjectId projectId) {
    return new FilterClient(client) {
        @Override
        protected <...> void doExecute(...) {
            projectResolver.executeOnProject(projectId, () -> super.doExecute(action, request, listener));
        }
    };
}

and use it like

projectClient(projectId).admin().indices()...

The downside is creating a client for each request. But that's probably not too bad since we create many other objects in request/response situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think a FilterClient like that is interesting, but I'm also worried about the overhead. I could just wrap the current client calls in this class with executeOnProject, but I intentionally didn't do that to reduce the changes. If we do go for the FilterClient, I would like to try to add it in a generic place (e.g. ProjectResolver), as other places will benefit from it. Maybe we could even store a map of project clients to avoid re-creating them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often this code runs? I'd avoid creating a map since that brings in new questions about expiration and size control and memory consumptions.

Overall I don't feel strongly between your change and my suggestion. I am OK with it as is if that is your preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataStreamLifecycle#run(ClusterState) is invoked every 5 minutes (by default). I was suggesting creating a map in some generic place, that lives throughout the entire life of the node/cluster. Having one map of Map<ProjectId, Client> in the MultiProjectProjectResolver sounds pretty manageable to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a method like projectClient(ProjectId) on the ProjectResolver sounds like a good idea. I'd start out without maintaining a map. Not because I think it is problematic to maintain such map but because it is not obviously a meaningful optimization without more data. We can always add the map when it becomes more obviously necessary in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a projectClient method in 10758c2. The extra object creation makes me feel a little bit uneasy. How do you feel about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want the default method to always create a new client since that is not necessary for DefaultProjectResolver. My original suggestion of having a local projectClient method had the same issue. A quick fix might be checking supportsMultipleProjects() and return the client as is if it is false, e.g. something like:

if (supportsMultipleProjects() == false && projectId.equals(getProjectId())) {
    return client;
} else {
    // return a new filterClient
}

This will bypass DefaultProjectResolver#executeOnProject and its projectId check. I think that's OK since the new method provides the same check and error cases will still go through the original executeOnProject.

Ideally it would be great if the method takes only a projectId and the client is provided internally by the ProjectResolver. This would require ProjectResolver to hold a reference to client. It is not an issue for the actual ProjectResolver implementations but maybe a bit fiddly for those TestProjectResolvers, unless we are happy to use NoOpClient by default. These can be tackled separate if deemed worthwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the shortcut to the projectClient method. I thought some more about how we can deal with that client, but having a map of clients or passing a reference of the client to ProjectResolver itself might be difficult, as some places make use of OriginSettingClient - not in DataStreamLifecycleService though.

I think I do prefer this approach over my initial approach, as this specifies the project ID way more explicitly, but I do still feel like there is room for improvement. We can improve later/separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants