Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the code path in createFileset and optimize path. #6562

Open
yuqi1129 opened this issue Feb 27, 2025 · 4 comments · May be fixed by #6689
Open

Optimize the code path in createFileset and optimize path. #6562

yuqi1129 opened this issue Feb 27, 2025 · 4 comments · May be fixed by #6689
Assignees

Comments

@yuqi1129
Copy link
Contributor

yuqi1129 commented Feb 27, 2025

Code in HadoopCatalogOperations#createFileset

   try {
      // formalize the path to avoid path without scheme, uri, authority, etc.
      filesetPath = formalizePath(filesetPath, conf);

      FileSystem fs = getFileSystem(filesetPath, conf);
      if (!fs.exists(filesetPath)) {
        if (!fs.mkdirs(filesetPath)) {
          throw new RuntimeException(
              "Failed to create fileset " + ident + " location " + filesetPath);
        }

        LOG.info("Created fileset {} location {}", ident, filesetPath);
      } else {
        LOG.info("Fileset {} manages the existing location {}", ident, filesetPath);
      }

    } catch (IOException ioe) {
      throw new RuntimeException(
          "Failed to create fileset " + ident + " location " + filesetPath, ioe);
    
  • filesetPath = formalizePath(filesetPath, conf);
  • FileSystem fs = getFileSystem(filesetPath, conf);

These two lines will repeatedly get and initialize file system and can be merged into one

      AtomicReference<FileSystem> fileSystem = new AtomicReference<>();
      Awaitility.await()
          .atMost(timeoutSeconds, TimeUnit.SECONDS)
          .until(
              () -> {
                fileSystem.set(provider.getFileSystem(path, config));
                return true;
              });
      return fileSystem.get();

This code can be replaced to Java Future mechanism to reduce the time taken in poll status.

There may be other minor points to improve.

@Abyss-lord
Copy link
Contributor

I would like to work on it.

@yuqi1129
Copy link
Contributor Author

OK, just go ahead.

@yuqi1129
Copy link
Contributor Author

@Abyss-lord
Do you have time to work on this issue?

@Abyss-lord
Copy link
Contributor

@yuqi1129 Yes, I can finish it in two days

Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 14, 2025
…Fileset and optimize path

Optimize the code path in createFileset and optimize path by using future.
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 14, 2025
…Fileset and optimize path

make thread pool static remove shutdown logic in finally block.
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 14, 2025
…Fileset and optimize path

Set the number of core threads and queue size, and set the idle time.
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 14, 2025
…Fileset and optimize path

Optimize the code path in createFileset and optimize path by using future.
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 14, 2025
…Fileset and optimize path

make thread pool static remove shutdown logic in finally block.
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 14, 2025
…Fileset and optimize path

Set the number of core threads and queue size, and set the idle time.
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 17, 2025
…Fileset and optimize path

set queue size from 100 to 500
Abyss-lord added a commit to Abyss-lord/gravitino that referenced this issue Mar 17, 2025
…Fileset and optimize path

fix some bugs in the code.
1. set min thread to Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 4)
2. set max thread to  Math.max(Runtime.getRuntime().availableProcessors() * 4, 400).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants