Skip to content

Conversation

@chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Sep 25, 2025

This parallelizes the compilations and dramatically reduces the time to build runtimes.

As part of this, teach the driver infrastructure to have an option to control the use of threads and to build the relevant thread pool and thread it into the various APIs.

However, it requires our ClangRunner to become thread-safe and to invoke Clang in a way that is thread-safe. This is somewhat challenging as the code in clang_main is distinctly not thread-safe.

To address this, the relevant logic of clang_main, especially the CC1 execution, is extracted into our runner and cleaned up to be much more appropriate in a multithreaded context. Much of this code should eventually be factored back into Clang, but that will be a follow-up patch to upstream.

Last but not least, this rearranges the ClangRunner API to make a bit more sense out of the different options for building runtimes, and have a clean model for which things need to be passed in at which points.

This parallelizes the compilations and dramatically reduces the time to
build runtimes.

As part of this, teach the driver infrastructure to have an option to
control the use of threads and to build the relevant thread pool and
thread it into the various APIs.

However, it requires our `ClangRunner` to become thread-safe and to
invoke Clang in a way that is thread-safe. This is somewhat challenging
as the code in `clang_main` is distinctly _not_ thread-safe.

To address this, the relevant logic of `clang_main`, especially the CC1
execution, is extracted into our runner and cleaned up to be much more
appropriate in a multithreaded context. Much of this code should
eventually be factored back into Clang, but that will be a follow-up
patch to upstream.
Comment on lines +231 to +237
llvm::SingleThreadExecutor single_thread({.ThreadsRequested = 1});
std::optional<llvm::DefaultThreadPool> threads;
driver_env_.thread_pool = &single_thread;
if (options.threads) {
threads.emplace(llvm::optimal_concurrency());
driver_env_.thread_pool = &*threads;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're are creating this SingleThreadExecutor at multiple levels it seems, both here and inside the ClangRunner. Could we consolidate to a single place - maybe making ClangRunner expect to receive a non-null executor always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could... I was just a bit torn doing so as it adds quite a bit of complexity to building and using the runner which is only needed if you're actually building runtimes.

Another alternative would be to have ClangRunner either accept a pre-built path or a thread pool to use for on-demand building, and remove the boolean option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you forsee wanting to use other thread pools than SingleThreadExecutor and DefaultThreadPool? Maybe we could tell it if we want threads or not and have ClangRunner make what it needs?

enum class BuildRuntimesOnDemand {
   UsePrebuiltOnly,
   BuildOnSingleThreaded,
   BuildOnWorkerThreads,
}

Taking either a path XOR a thread pool also sounds good, whichever you pref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you forsee wanting to use other thread pools than SingleThreadExecutor and DefaultThreadPool?

Some possibility. One thing I wonder is if we'll want a (much) larger thread pool to fully absorb the latency, but would like to avoid it given the overhead.

Maybe we could tell it if we want threads or not and have ClangRunner make what it needs?

When using threads, my expectation is that it'll be very desirable to use the existing thread pool to avoid paying the cost of spinning one up and forking all the threads.

It also allows more global management of the load.

This is why I somewhat like the driver either accepting or building a thread pool, and then making it available for any commands or subcommands to use.

Taking either a path XOR a thread pool also sounds good, whichever you pref

I think what I'm liking is to take one of:

  • A thread pool, enabling on-demand building in that pool.
  • A pre-built path that will be used.
  • Nothing, disabling on-demand building, and its up to the caller to use the runner in a way compatible with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I thought more about this and I think I had the fundamental wrong structure of this API.

I've restructured everything so that we have a simple constructor that only accepts the necessary components. Then there are three variations on Run -- using on-demand runtimes with a cache and thread pool, using pre-built runtimes, and using no runtimes. I've also updated comments and callers accourdingly. I think this ends up more clear, but PTAL and let me know.

Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, PTAL!

Comment on lines +231 to +237
llvm::SingleThreadExecutor single_thread({.ThreadsRequested = 1});
std::optional<llvm::DefaultThreadPool> threads;
driver_env_.thread_pool = &single_thread;
if (options.threads) {
threads.emplace(llvm::optimal_concurrency());
driver_env_.thread_pool = &*threads;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could... I was just a bit torn doing so as it adds quite a bit of complexity to building and using the runner which is only needed if you're actually building runtimes.

Another alternative would be to have ClangRunner either accept a pre-built path or a thread pool to use for on-demand building, and remove the boolean option.

@chandlerc chandlerc requested a review from danakj September 25, 2025 23:34
Comment on lines 680 to 684
llvm::SmallVector<llvm::NewArchiveMember, 0> unwrapped_objs;
unwrapped_objs.reserve(objs.size());
for (auto& obj : objs) {
unwrapped_objs.push_back(*std::move(obj));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you enjoy some code golf...

llvm::SmallVector<llvm::NewArchiveMember, 0> unwrapped_objs(
    llvm::map_range(objs, [](auto& obj) { return *std::move(obj); }));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I somewhat prefer the simplicity of the loop here... It's a close call though, happy to switch if it helps a lot.

Comment on lines +231 to +237
llvm::SingleThreadExecutor single_thread({.ThreadsRequested = 1});
std::optional<llvm::DefaultThreadPool> threads;
driver_env_.thread_pool = &single_thread;
if (options.threads) {
threads.emplace(llvm::optimal_concurrency());
driver_env_.thread_pool = &*threads;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you forsee wanting to use other thread pools than SingleThreadExecutor and DefaultThreadPool? Maybe we could tell it if we want threads or not and have ClangRunner make what it needs?

enum class BuildRuntimesOnDemand {
   UsePrebuiltOnly,
   BuildOnSingleThreaded,
   BuildOnWorkerThreads,
}

Taking either a path XOR a thread pool also sounds good, whichever you pref

Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL, think I found a better API structure.

Comment on lines 680 to 684
llvm::SmallVector<llvm::NewArchiveMember, 0> unwrapped_objs;
unwrapped_objs.reserve(objs.size());
for (auto& obj : objs) {
unwrapped_objs.push_back(*std::move(obj));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I somewhat prefer the simplicity of the loop here... It's a close call though, happy to switch if it helps a lot.

Comment on lines +231 to +237
llvm::SingleThreadExecutor single_thread({.ThreadsRequested = 1});
std::optional<llvm::DefaultThreadPool> threads;
driver_env_.thread_pool = &single_thread;
if (options.threads) {
threads.emplace(llvm::optimal_concurrency());
driver_env_.thread_pool = &*threads;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you forsee wanting to use other thread pools than SingleThreadExecutor and DefaultThreadPool?

Some possibility. One thing I wonder is if we'll want a (much) larger thread pool to fully absorb the latency, but would like to avoid it given the overhead.

Maybe we could tell it if we want threads or not and have ClangRunner make what it needs?

When using threads, my expectation is that it'll be very desirable to use the existing thread pool to avoid paying the cost of spinning one up and forking all the threads.

It also allows more global management of the load.

This is why I somewhat like the driver either accepting or building a thread pool, and then making it available for any commands or subcommands to use.

Taking either a path XOR a thread pool also sounds good, whichever you pref

I think what I'm liking is to take one of:

  • A thread pool, enabling on-demand building in that pool.
  • A pre-built path that will be used.
  • Nothing, disabling on-demand building, and its up to the caller to use the runner in a way compatible with that.

Comment on lines +231 to +237
llvm::SingleThreadExecutor single_thread({.ThreadsRequested = 1});
std::optional<llvm::DefaultThreadPool> threads;
driver_env_.thread_pool = &single_thread;
if (options.threads) {
threads.emplace(llvm::optimal_concurrency());
driver_env_.thread_pool = &*threads;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I thought more about this and I think I had the fundamental wrong structure of this API.

I've restructured everything so that we have a simple constructor that only accepts the necessary components. Then there are three variations on Run -- using on-demand runtimes with a cache and thread pool, using pre-built runtimes, and using no runtimes. I've also updated comments and callers accourdingly. I think this ends up more clear, but PTAL and let me know.

Copy link
Contributor

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the new API looks like a really nice improvement, thanks. LGTM

@chandlerc chandlerc enabled auto-merge September 30, 2025 07:14
@chandlerc chandlerc added this pull request to the merge queue Sep 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Sep 30, 2025
@danakj danakj added this pull request to the merge queue Sep 30, 2025
Merged via the queue into carbon-language:trunk with commit 35fb000 Sep 30, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants