forked from yondonfu/comfystream
-
Notifications
You must be signed in to change notification settings - Fork 6
Multi-Process Parallelization with ProcessPoolExecutor #276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
BuffMcBigHuge
wants to merge
55
commits into
livepeer:main
Choose a base branch
from
BuffMcBigHuge:marco/multiprocess-clean
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Multi-Process Parallelization with ProcessPoolExecutor #276
BuffMcBigHuge
wants to merge
55
commits into
livepeer:main
from
BuffMcBigHuge:marco/multiprocess-clean
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ced uncessary base64 input frame operations, prep for multi-instance, cleanup.
…dded config for server management.
…ame size handling, commented out some logging.
Co-authored-by: John | Elite Encoder <[email protected]>
…the ui, cleanup of tensor code.
…ediate step, moved prompt execution strategy to `execution_start` event, moved buffer to self variable to avoid reinitalization.
…to improve frame buffer, modified comfy arg handling.
…ng on same machine.
…f app, pipeline and config files.
…cations to spawning instances, better handling of misconfigured workspace.
…g to task queue, fixed issues.
…g, attempt to fix tensor_rt directory retrieval in comfyui.
…oved extreanous executor type param, commented out some logging.
…oved some logging.
…ponse frame management.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This pull request refactors the core stream processing architecture of Comfystream to enable true parallelization using a pool of worker processes managed by
comfy.distributed.process_pool_executor.ProcessPoolExecutor( HiddenSwitch ). This transition from a single-process, multi-threaded model to a multi-process architecture allows for significant performance gains, greater stability, and the ability to run multiple, independent ComfyUI workflows concurrently.The key benefit of this update:
Detailed Changes by Component
1. ComfyStreamClient (
client.py)The
ComfyStreamClienthas been fundamentally redesigned to manage the process pool.ProcessPoolExecutorto spawn and manage a configurable number of worker processes. This replaces the previous model where theEmbeddedComfyClientlikely used threads within a single process.multiprocessing.Managerqueues (image_inputs,image_outputs). This is essential for safely passing tensor data and control messages across process boundaries.distribute_framestask acts as a manager, creating aworker_looptask for each worker in the pool.worker_loopis a persistent task that continuously requests work from theEmbeddedComfyClient, which in turn pulls frames from the shared input queue. This architecture allows workers to independently process frames from one or more workflows.update_promptslogic is now much simpler. It updates a shared list of prompts, and the worker loops automatically pick up the changes on their next iteration without requiring a restart or complex locking.cleanupmethod is significantly more robust. It follows a strict sequence:shutting_downflag to stop workers gracefully.EmbeddedComfyClient.ProcessPoolExecutor, terminating any stubborn worker processes to prevent zombies.2. Pipeline (
pipeline.py)The
Pipelinehas been adapted to integrate with the new multi-process client and to improve real-time stream stability.Pipelinenow features anoutput_buffer(asyncio.Queue). A background_collect_frames_simpletask continuously polls the client for completed frames and places them in this buffer.get_processed_video_framenow pulls from this buffer instead of directly from the client. This decouples frame delivery from frame processing. If the buffer is empty (i.e., processing is lagging), it now returns the original unprocessed frame to avoid stalling the video stream, maintaining a constant frame rate for the client.frame_idis now assigned to each incoming video frame. This is critical for tracking frames as they are passed between processes, although the current "simple" collector does not enforce order.Pipelineconstructor now accepts amax_workersargument, which is passed to theComfyStreamClientto configure the size of the process pool.3. Tensor Cache (
tensor_cache.py)The
tensor_cachemodule has been repurposed to serve as the bridge between theEmbeddedComfyClient(running in a worker process) and the multiprocessing queues managed by the main process.tensor_cacheused simple in-memory queues. The new implementation features aninit_tensor_cachefunction, which is called byProcessPoolExecutorwhen each worker process is spawned.MultiProcessInputQueue,MultiProcessOutputQueue) that interface directly with themultiprocessing.Queueobjects created in the main process. This allows theLoadTensorandSaveTensorcustom nodes (which usetensor_cache) to function correctly within the multi-process environment without modification..cpu()) before being placed in an output queue, which is a requirement for sending tensor data across process boundaries.4. Server Application (
app.py)The main server application has been enhanced for stability and configuration.
--workerscommand-line argument has been added to allow users to specify the number of worker processes to spawn.SIGINT(Ctrl+C) andSIGTERMsignals.on_shutdownhandler now correctly cleans up thepipeline(and its worker processes) before closing network connections.force_cleanup_and_exitfunction is included to terminate the executor's processes if they fail to shut down gracefully.Note
The ProcessPoolExecutor has been shown to increase performance at the cost of latency. As worker count increases, latency also increases. This can be improved with changes to the frame buffer management. There is also potential for new CPU bottlenecking caused by the increase throughput, which may result in frame-timing oscillations.