-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make data stream lifecycle project-aware #125476
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
server/src/main/java/org/elasticsearch/action/ResultDeduplicator.java
Outdated
Show resolved
Hide resolved
FTR, DLM mostly relies on internal cluster tests. Currently, we don't have a way to run internal cluster tests in MP mode. I made some local hacks to be able to run the DLM internal cluster tests run in MP mode anyway and they all passed. Those hacks aren't for committing purposes, but the MP team is working with ES Delivery to make proper changes to the testing infrastructure to allow running all kinds of tests in MP mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments
server/src/main/java/org/elasticsearch/action/ResultDeduplicator.java
Outdated
Show resolved
Hide resolved
try { | ||
projectResolver.executeOnProject(projectId, () -> run(state.projectState(projectId))); | ||
} catch (Exception e) { | ||
logger.error(Strings.format("Data stream lifecycle failed to run on project [%s]", projectId), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worthwhile to use ExceptionsHelper#useOrSuppress
and throw the final exception (if any) which seems to be matching the existing behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body of run(ProjectState projectState)
already catches a few generic exceptions, so to me it seemed like we don't really expect any exception to be thrown in there. As in, no code that calls the run(ClusterState state)
has any specific code for handling exceptions. So I don't think it would make much of a difference here.
for (var projectId : state.metadata().projects().keySet()) { | ||
// We catch inside the loop to avoid one broken project preventing DLM to run on other projects. | ||
try { | ||
projectResolver.executeOnProject(projectId, () -> run(state.projectState(projectId))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we also pass projectId
to all the downstream methods, I wonder whether the effect of using executeOnProject
is a bit too far away for understanding the code, i.e. it was not clear to me that the later client.execute
is going to do the right thing. An alternative would be using executeOnProject
on the spots where client is invoked. We could have a method that returns dedicated client similar to the following
Client projectClient(ProjectId projectId) {
return new FilterClient(client) {
@Override
protected <...> void doExecute(...) {
projectResolver.executeOnProject(projectId, () -> super.doExecute(action, request, listener));
}
};
}
and use it like
projectClient(projectId).admin().indices()...
The downside is creating a client for each request. But that's probably not too bad since we create many other objects in request/response situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think a FilterClient
like that is interesting, but I'm also worried about the overhead. I could just wrap the current client calls in this class with executeOnProject
, but I intentionally didn't do that to reduce the changes. If we do go for the FilterClient
, I would like to try to add it in a generic place (e.g. ProjectResolver
), as other places will benefit from it. Maybe we could even store a map of project clients to avoid re-creating them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often this code runs? I'd avoid creating a map since that brings in new questions about expiration and size control and memory consumptions.
Overall I don't feel strongly between your change and my suggestion. I am OK with it as is if that is your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataStreamLifecycle#run(ClusterState)
is invoked every 5 minutes (by default). I was suggesting creating a map in some generic place, that lives throughout the entire life of the node/cluster. Having one map of Map<ProjectId, Client>
in the MultiProjectProjectResolver
sounds pretty manageable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a method like projectClient(ProjectId)
on the ProjectResolver sounds like a good idea. I'd start out without maintaining a map. Not because I think it is problematic to maintain such map but because it is not obviously a meaningful optimization without more data. We can always add the map when it becomes more obviously necessary in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a projectClient
method in 10758c2. The extra object creation makes me feel a little bit uneasy. How do you feel about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want the default method to always create a new client since that is not necessary for DefaultProjectResolver
. My original suggestion of having a local projectClient
method had the same issue. A quick fix might be checking supportsMultipleProjects()
and return the client
as is if it is false
, e.g. something like:
if (supportsMultipleProjects() == false && projectId.equals(getProjectId())) {
return client;
} else {
// return a new filterClient
}
This will bypass DefaultProjectResolver#executeOnProject
and its projectId
check. I think that's OK since the new method provides the same check and error cases will still go through the original executeOnProject
.
Ideally it would be great if the method takes only a projectId
and the client
is provided internally by the ProjectResolver
. This would require ProjectResolver
to hold a reference to client
. It is not an issue for the actual ProjectResolver implementations but maybe a bit fiddly for those TestProjectResolvers, unless we are happy to use NoOpClient
by default. These can be tackled separate if deemed worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the shortcut to the projectClient
method. I thought some more about how we can deal with that client
, but having a map of clients or passing a reference of the client to ProjectResolver
itself might be difficult, as some places make use of OriginSettingClient
- not in DataStreamLifecycleService
though.
I think I do prefer this approach over my initial approach, as this specifies the project ID way more explicitly, but I do still feel like there is room for improvement. We can improve later/separately.
Now that all actions that DLM depends on are project-aware, we can make DLM itself project-aware.
There still exists only one instance of
DataStreamLifecycleService
, it just loops over all the projects - which matches the approach we've taken for similar scenarios thus far.