Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6562] improvement(catalogs): Optimize the code path in createFileset and optimize path #6689

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Abyss-lord
Copy link
Contributor

What changes were proposed in this pull request?

Optimize the code path in createFileset and optimize path by using future.

Why are the changes needed?

Fix: #6562

Does this PR introduce any user-facing change?

No

How was this patch tested?

No, this is sequence chart of the CreateFileset.
image

@Abyss-lord
Copy link
Contributor Author

@yuqi1129 could you please review this PR when you have time? I’d really appreciate your feedback.

@yuqi1129
Copy link
Contributor

@yuqi1129 could you please review this PR when you have time? I’d really appreciate your feedback.

Sure

@Abyss-lord
Copy link
Contributor Author

@yuqi1129 I’ve finished updating the code. Please take a look at the PR again when you have time. I make thread pool static and remove shutdown logic in finally block.

@Abyss-lord Abyss-lord requested a review from yuqi1129 March 14, 2025 07:42
Copy link
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor problems, others LGTM

@Abyss-lord
Copy link
Contributor Author

@yuqi1129 I’ve finished updating the code. Please take a look at the PR again when you have time. My update is as follows

  1. Set the number of core threads and queue size.
  2. Set the idle time.

…Fileset and optimize path

Optimize the code path in createFileset and optimize path by using future.
…Fileset and optimize path

make thread pool static remove shutdown logic in finally block.
…Fileset and optimize path

Set the number of core threads and queue size, and set the idle time.
@Abyss-lord Abyss-lord force-pushed the optimize-create-fileset branch from 6a29976 to a54087b Compare March 14, 2025 09:44
@Abyss-lord Abyss-lord requested a review from yuqi1129 March 14, 2025 09:44
…Fileset and optimize path

set queue size from 100 to 500
@Abyss-lord
Copy link
Contributor Author

@yuqi1129 I’ve finished updating the code. Please take a look at the PR again when you have time. My update is as follows

  1. set queue size from 100 to 500

@Abyss-lord Abyss-lord requested a review from yuqi1129 March 17, 2025 01:22
…Fileset and optimize path

fix some bugs in the code.
1. set min thread to Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 4)
2. set max thread to  Math.max(Runtime.getRuntime().availableProcessors() * 4, 400).
@Abyss-lord
Copy link
Contributor Author

@jerqi I’ve finished updating the code. Please take a look at the PR again when you have time. My update is as follows

  1. set min thread to Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 4)
  2. set max thread to Math.max(Runtime.getRuntime().availableProcessors() * 4, 400).

@yuqi1129
Copy link
Contributor

Let's wait for the CI to pass before proceeding with LGTM.

@yuqi1129 yuqi1129 closed this Mar 17, 2025
@yuqi1129 yuqi1129 reopened this Mar 17, 2025
@yuqi1129
Copy link
Contributor

@Abyss-lord
There seems to be an issue with the CI. Please take some time to resolve it.

@Abyss-lord
Copy link
Contributor Author

@Abyss-lord There seems to be an issue with the CI. Please take some time to resolve it.

@yuqi1129 This problem is probably not caused by this change. The same situation occurred before, and it can be solved by running again.

@Abyss-lord Abyss-lord requested a review from yuqi1129 March 18, 2025 00:58
@yuqi1129
Copy link
Contributor

@Abyss-lord There seems to be an issue with the CI. Please take some time to resolve it.

@yuqi1129 This problem is probably not caused by this change. The same situation occurred before, and it can be solved by running again.

Although things may not be as they seem, I have noticed that no similar problem has been found in other PRs.

@yuqi1129
Copy link
Contributor

yuqi1129 commented Mar 18, 2025

@Abyss-lord There seems to be an issue with the CI. Please take some time to resolve it.

@yuqi1129 This problem is probably not caused by this change. The same situation occurred before, and it can be solved by running again.

Although things may not be as they seem, I have noticed that no similar problem has been found in other PRs.

@Abyss-lord
There are some test errors that can be reproduced locally. Please try to resolve them.

@jerryshao
Copy link
Contributor

jerryshao commented Mar 18, 2025

What is the benefit of this change? I think using threadpool may not benefit a lot, instead it makes the code very complex. Also, we may potentially have the classloader issue since we're using the different thread to load the FS.

@yuqi1129
Copy link
Contributor

What is the benefit of this change? I think using threadpool may not benefit a lot, instead it makes the code very complex. Also, we may potentially have the classloader issue since we're using the different thread to load the FS.

The awaitiblity library will periodically poll the result and will not return immediately when the condition is satisfied.

Awaitility.await()
          .atMost(timeoutSeconds, TimeUnit.SECONDS)
          .pollInterval(TimeUnits.MilliSeconds, 1)
          .until(
				() -> {
                            fileSystem.set(provider.getFileSystem(path, config));
                            return true;
                         }
                  )

Even though fileSystem.set(provider.getFileSystem(path, config)); takes 0ms, the whole code will mostly consume 1ms at least to get the result. Java Future will not have this problem.

Actually, introducing thread pools will make things a bit complicated. If the issue mentioned above is not a big problem, we can omit this PR.

@Abyss-lord
Copy link
Contributor Author

@yuqi1129 @jerryshao Should I continue with this pr, or should we pause to discuss?

@yuqi1129
Copy link
Contributor

@yuqi1129 @jerryshao Should I continue with this pr, or should we pause to discuss?

OK, please revert the code about thread-pool and only change the poll interval of awaitibility

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize the code path in createFileset and optimize path.
3 participants