Make data stream lifecycle project-aware #125476

nielsbauman · 2025-03-24T08:10:43Z

Now that all actions that DLM depends on are project-aware, we can make DLM itself project-aware.
There still exists only one instance of DataStreamLifecycleService, it just loops over all the projects - which matches the approach we've taken for similar scenarios thus far.

elasticsearchmachine · 2025-03-24T08:11:06Z

Pinging @elastic/es-data-management (Team:Data Management)

server/src/main/java/org/elasticsearch/action/ResultDeduplicator.java

nielsbauman · 2025-03-24T09:19:46Z

FTR, DLM mostly relies on internal cluster tests. Currently, we don't have a way to run internal cluster tests in MP mode. I made some local hacks to be able to run the DLM internal cluster tests run in MP mode anyway and they all passed. Those hacks aren't for committing purposes, but the MP team is working with ES Delivery to make proper changes to the testing infrastructure to allow running all kinds of tests in MP mode.

ywangd

I have some comments

server/src/main/java/org/elasticsearch/action/ResultDeduplicator.java

ywangd · 2025-03-25T03:39:37Z

...treams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java

+            try {
+                projectResolver.executeOnProject(projectId, () -> run(state.projectState(projectId)));
+            } catch (Exception e) {
+                logger.error(Strings.format("Data stream lifecycle failed to run on project [%s]", projectId), e);


I wonder if it is worthwhile to use ExceptionsHelper#useOrSuppress and throw the final exception (if any) which seems to be matching the existing behaviour.

The body of run(ProjectState projectState) already catches a few generic exceptions, so to me it seemed like we don't really expect any exception to be thrown in there. As in, no code that calls the run(ClusterState state) has any specific code for handling exceptions. So I don't think it would make much of a difference here.

ywangd · 2025-03-25T04:03:58Z

...treams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java

+        for (var projectId : state.metadata().projects().keySet()) {
+            // We catch inside the loop to avoid one broken project preventing DLM to run on other projects.
+            try {
+                projectResolver.executeOnProject(projectId, () -> run(state.projectState(projectId)));


Since we also pass projectId to all the downstream methods, I wonder whether the effect of using executeOnProject is a bit too far away for understanding the code, i.e. it was not clear to me that the later client.execute is going to do the right thing. An alternative would be using executeOnProject on the spots where client is invoked. We could have a method that returns dedicated client similar to the following

Client projectClient(ProjectId projectId) { return new FilterClient(client) { @Override protected <...> void doExecute(...) { projectResolver.executeOnProject(projectId, () -> super.doExecute(action, request, listener)); } }; }

and use it like

projectClient(projectId).admin().indices()...

The downside is creating a client for each request. But that's probably not too bad since we create many other objects in request/response situations.

Hm I think a FilterClient like that is interesting, but I'm also worried about the overhead. I could just wrap the current client calls in this class with executeOnProject, but I intentionally didn't do that to reduce the changes. If we do go for the FilterClient, I would like to try to add it in a generic place (e.g. ProjectResolver), as other places will benefit from it. Maybe we could even store a map of project clients to avoid re-creating them?

How often this code runs? I'd avoid creating a map since that brings in new questions about expiration and size control and memory consumptions.

Overall I don't feel strongly between your change and my suggestion. I am OK with it as is if that is your preference.

DataStreamLifecycle#run(ClusterState) is invoked every 5 minutes (by default). I was suggesting creating a map in some generic place, that lives throughout the entire life of the node/cluster. Having one map of Map<ProjectId, Client> in the MultiProjectProjectResolver sounds pretty manageable to me.

I think having a method like projectClient(ProjectId) on the ProjectResolver sounds like a good idea. I'd start out without maintaining a map. Not because I think it is problematic to maintain such map but because it is not obviously a meaningful optimization without more data. We can always add the map when it becomes more obviously necessary in future.

I added a projectClient method in 10758c2. The extra object creation makes me feel a little bit uneasy. How do you feel about it?

I think we don't want the default method to always create a new client since that is not necessary for DefaultProjectResolver. My original suggestion of having a local projectClient method had the same issue. A quick fix might be checking supportsMultipleProjects() and return the client as is if it is false, e.g. something like:

if (supportsMultipleProjects() == false && projectId.equals(getProjectId())) { return client; } else { // return a new filterClient }

This will bypass DefaultProjectResolver#executeOnProject and its projectId check. I think that's OK since the new method provides the same check and error cases will still go through the original executeOnProject.

Ideally it would be great if the method takes only a projectId and the client is provided internally by the ProjectResolver. This would require ProjectResolver to hold a reference to client. It is not an issue for the actual ProjectResolver implementations but maybe a bit fiddly for those TestProjectResolvers, unless we are happy to use NoOpClient by default. These can be tackled separate if deemed worthwhile.

I added the shortcut to the projectClient method. I thought some more about how we can deal with that client, but having a map of clients or passing a reference of the client to ProjectResolver itself might be difficult, as some places make use of OriginSettingClient - not in DataStreamLifecycleService though.

I think I do prefer this approach over my initial approach, as this specifies the project ID way more explicitly, but I do still feel like there is room for improvement. We can improve later/separately.

Make data stream lifecycle project-aware

5362423

nielsbauman added >non-issue :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team v9.1.0 labels Mar 24, 2025

nielsbauman requested a review from a team March 24, 2025 08:10

nielsbauman commented Mar 24, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/action/ResultDeduplicator.java Outdated Show resolved Hide resolved

ywangd reviewed Mar 25, 2025

View reviewed changes

nielsbauman added 6 commits March 25, 2025 08:39

Merge branch 'main' into mp-dlm

b1827b1

Apply suggestion to ResultDeduplicator

04f2e8c

Merge branch 'main' into mp-dlm

e6abe6a

Add ProjectResolver#projectClient

10758c2

Merge branch 'main' into mp-dlm

cd5bdde

Add shortcut

c069dfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make data stream lifecycle project-aware #125476

Make data stream lifecycle project-aware #125476

nielsbauman commented Mar 24, 2025

elasticsearchmachine commented Mar 24, 2025

nielsbauman commented Mar 24, 2025

ywangd left a comment

ywangd Mar 25, 2025

nielsbauman Mar 25, 2025

ywangd Mar 25, 2025

nielsbauman Mar 25, 2025

ywangd Mar 25, 2025

nielsbauman Mar 25, 2025

ywangd Mar 26, 2025

nielsbauman Mar 26, 2025

ywangd Mar 26, 2025

nielsbauman Mar 28, 2025

Make data stream lifecycle project-aware #125476

Are you sure you want to change the base?

Make data stream lifecycle project-aware #125476

Conversation

nielsbauman commented Mar 24, 2025

elasticsearchmachine commented Mar 24, 2025

nielsbauman commented Mar 24, 2025

ywangd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment