Multi-Process Parallelization with ProcessPoolExecutor #276

BuffMcBigHuge · 2025-06-25T02:04:42Z

Summary

This pull request refactors the core stream processing architecture of Comfystream to enable true parallelization using a pool of worker processes managed by comfy.distributed.process_pool_executor.ProcessPoolExecutor ( HiddenSwitch ). This transition from a single-process, multi-threaded model to a multi-process architecture allows for significant performance gains, greater stability, and the ability to run multiple, independent ComfyUI workflows concurrently.

The key benefit of this update:

Increased Throughput: Multiple worker processes can execute ComfyUI workflows in parallel, dramatically increasing the number of frames that can be processed per second.

Detailed Changes by Component

1. ComfyStreamClient (`client.py`)

The ComfyStreamClient has been fundamentally redesigned to manage the process pool.

Process-Based Parallelism: The client now leverages ProcessPoolExecutor to spawn and manage a configurable number of worker processes. This replaces the previous model where the EmbeddedComfyClient likely used threads within a single process.
Inter-Process Communication (IPC): Communication between the main server process and the worker pool is now handled by multiprocessing.Manager queues (image_inputs, image_outputs). This is essential for safely passing tensor data and control messages across process boundaries.
Worker Management:
- A new distribute_frames task acts as a manager, creating a worker_loop task for each worker in the pool.
- The worker_loop is a persistent task that continuously requests work from the EmbeddedComfyClient, which in turn pulls frames from the shared input queue. This architecture allows workers to independently process frames from one or more workflows.
Simplified Prompt Updates: The update_prompts logic is now much simpler. It updates a shared list of prompts, and the worker loops automatically pick up the changes on their next iteration without requiring a restart or complex locking.
Robust Cleanup: The cleanup method is significantly more robust. It follows a strict sequence:
1. Sets a shutting_down flag to stop workers gracefully.
2. Cancels all worker and manager tasks.
3. Shuts down the EmbeddedComfyClient.
4. Explicitly shuts down the ProcessPoolExecutor, terminating any stubborn worker processes to prevent zombies.

2. Pipeline (`pipeline.py`)

The Pipeline has been adapted to integrate with the new multi-process client and to improve real-time stream stability.

Decoupled Frame Processing:
- The Pipeline now features an output_buffer (asyncio.Queue). A background _collect_frames_simple task continuously polls the client for completed frames and places them in this buffer.
- get_processed_video_frame now pulls from this buffer instead of directly from the client. This decouples frame delivery from frame processing. If the buffer is empty (i.e., processing is lagging), it now returns the original unprocessed frame to avoid stalling the video stream, maintaining a constant frame rate for the client.
Frame Tracking: A unique frame_id is now assigned to each incoming video frame. This is critical for tracking frames as they are passed between processes, although the current "simple" collector does not enforce order.
Configuration: The Pipeline constructor now accepts a max_workers argument, which is passed to the ComfyStreamClient to configure the size of the process pool.

3. Tensor Cache (`tensor_cache.py`)

The tensor_cache module has been repurposed to serve as the bridge between the EmbeddedComfyClient (running in a worker process) and the multiprocessing queues managed by the main process.

Worker-Specific Initialization: The previous tensor_cache used simple in-memory queues. The new implementation features an init_tensor_cache function, which is called by ProcessPoolExecutor when each worker process is spawned.
Queue Wrapping: This function replaces the module's global queue objects with wrapper classes (MultiProcessInputQueue, MultiProcessOutputQueue) that interface directly with the multiprocessing.Queue objects created in the main process. This allows the LoadTensor and SaveTensor custom nodes (which use tensor_cache) to function correctly within the multi-process environment without modification.
CPU Tensor Transfer: Tensors are explicitly moved to the CPU (.cpu()) before being placed in an output queue, which is a requirement for sending tensor data across process boundaries.

4. Server Application (`app.py`)

The main server application has been enhanced for stability and configuration.

Worker Configuration: A --workers command-line argument has been added to allow users to specify the number of worker processes to spawn.
Graceful Shutdown:
- The application now listens for SIGINT (Ctrl+C) and SIGTERM signals.
- Upon receiving a signal, a graceful shutdown sequence is initiated. The on_shutdown handler now correctly cleans up the pipeline (and its worker processes) before closing network connections.
- A force_cleanup_and_exit function is included to terminate the executor's processes if they fail to shut down gracefully.
Improved Logging: Logging messages have been standardized with a prefix for clearer debugging.

Note

The ProcessPoolExecutor has been shown to increase performance at the cost of latency. As worker count increases, latency also increases. This can be improved with changes to the frame buffer management. There is also potential for new CPU bottlenecking caused by the increase throughput, which may result in frame-timing oscillations.

…ced uncessary base64 input frame operations, prep for multi-instance, cleanup.

…dded config for server management.

…ame size handling, commented out some logging.

…websocket node.

Co-authored-by: John | Elite Encoder <[email protected]>

…the ui, cleanup of tensor code.

…ediate step, moved prompt execution strategy to `execution_start` event, moved buffer to self variable to avoid reinitalization.

…to improve frame buffer, modified comfy arg handling.

…ng on same machine.

…quent runs.

…f app, pipeline and config files.

…cations to spawning instances, better handling of misconfigured workspace.

…cript.

…ging utility.

… tests.

…g to task queue, fixed issues.

…g, attempt to fix tensor_rt directory retrieval in comfyui.

…oved extreanous executor type param, commented out some logging.

…ment.

…oved some logging.

…ponse frame management.

BuffMcBigHuge and others added 30 commits March 18, 2025 16:08

Preliminary work for ComfyUI native API integration.

dcbdcfe

Cleanup of pre/post processing of frames.

b3a95ad

Added built-in nodes for base64 string and websocket image send, redu…

50ca4a1

…ced uncessary base64 input frame operations, prep for multi-instance, cleanup.

Preliminary work on multi-Comfy server inference.

d66dac4

Work on frame timing and management, added max_frame_wait argument, a…

d3b0b3c

…dded config for server management.

Cleaned up prompt manipulation with custom nodes.

63014e3

Added frame tracking, add frame timing stability, added mismatched fr…

15b51a8

…ame size handling, commented out some logging.

Removed requirement for workspace in app startup.

80636b6

Cleanup of logging, added log_level argument, testing of send tensor …

2cdb6fa

…websocket node.

Setting a few logs to debug.

c4d2ea1

Update requirements.txt

0ee6222

Co-authored-by: John | Elite Encoder <[email protected]>

Added native nodes into root nodes.

61a03fa

Rebuilt get_available_nodes using native ComfyUI api for retrofit to …

77e28ff

…the ui, cleanup of tensor code.

Modified base64 processing to use torchvision instead of numpy interm…

13b511a

…ediate step, moved prompt execution strategy to `execution_start` event, moved buffer to self variable to avoid reinitalization.

Merged upstream of main 0.0.5, small modification to logging.

ee88589

Built Comfy subprocess spawn client mode, built dynamic output pacer …

c9aff80

…to improve frame buffer, modified comfy arg handling.

Merge branch 'main' into comfy-native-local.

0def790

Small fix, cleanup.

937384f

Merge branch 'main' into comfy-native-local

3b1cdd7

Added cuda-devices and workers-start-port params for multi-gpu spawni…

f7326c4

…ng on same machine.

Merge branch 'main' into comfy-native-local

75bd88e

Fixed issue with cleanup not properly resetting the clients for subse…

0c1aa0e

…quent runs.

Better error handling for Comfy instances via spawn, reorganization o…

6c07c65

…f app, pipeline and config files.

Fix to linux subprocess command.

93308b6

Added spawned comfy specific logging, modified client logging, modifi…

98b78e8

…cations to spawning instances, better handling of misconfigured workspace.

Modification to logging handler for subprocesses.

5b3ac66

Merge branch 'main' into comfy-native-local

766277b

Added optional ROOT_DIR environment variable to help with build_trt s…

07a9c51

…cript.

Code cleanup, modification to frame timing mechanism, added frame log…

fb44f6d

…ging utility.

Frame logging development, removal of obsolete pipeline code, logging…

9dc280b

… tests.

BuffMcBigHuge added 25 commits May 7, 2025 18:15

Merge branch 'main' into marco/logging-updates

c7702fc

Added frame_log_file as argument to select logging file, moved loggin…

262ebb6

…g to task queue, fixed issues.

Added frame file logging to embedded client.

1304fd1

Merge branch 'main' into marco/logging-updates

db2e191

Merge branch 'main' into marco/logging-updates

8aadc8d

First version of multi working.

2539741

Merge branch 'main' into marco/multiprocess

7ec44b3

Simplify video input, small fix.

ad44766

Restructure of distribution logic, added pipeline/tensor_cache loggin…

cbeb89e

…g, attempt to fix tensor_rt directory retrieval in comfyui.

Modified queue size, re-worked prompting, testing queue sizes, logging.

9d0ee6e

Better terminal close handling, fixes to kwargs sent to client.

5bdc1df

Merge branch 'main' into marco/multiprocess

9c449fe

Fixed issue with pipeline reset on workflow change or UI refresh, rem…

63cebb0

…oved extreanous executor type param, commented out some logging.

Fixes to logging, attempt at fixing root path issue for models.

a642b05

Attempts to fix tensorrt directory issue, logging and testing develop…

cee5445

…ment.

Merge branch 'main' into marco/multiprocess

b68f0ec

Refactored prompt updating from UI to worker processes, added and rem…

7754fbb

…oved some logging.

Merge branch 'main' into marco/multiprocess.

621390a

Merge fixes.

e5d26ea

Revert of node retrieval testing.

b0516d2

Added node cache system to stop process pool interuption, improve res…

d70982e

…ponse frame management.

Rebuit frame processing management for smoother playback.

e22ceff

Merge branch 'main' into marco/multiprocess-clean

3847c07

Removal of extreanous files, cleanup, modified _multi as primary files.

2773c9a

Testing - small fix with merge.

02e8d08

eliteprox linked an issue Dec 8, 2025 that may be closed by this pull request

Ensure audio and video streams can be processed at the the same time #226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-Process Parallelization with ProcessPoolExecutor #276

Multi-Process Parallelization with ProcessPoolExecutor #276

Uh oh!

BuffMcBigHuge commented Jun 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Multi-Process Parallelization with ProcessPoolExecutor #276

Are you sure you want to change the base?

Multi-Process Parallelization with ProcessPoolExecutor #276

Uh oh!

Conversation

BuffMcBigHuge commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Detailed Changes by Component

1. ComfyStreamClient (client.py)

2. Pipeline (pipeline.py)

3. Tensor Cache (tensor_cache.py)

4. Server Application (app.py)

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BuffMcBigHuge commented Jun 25, 2025 •

edited

Loading

1. ComfyStreamClient (`client.py`)

2. Pipeline (`pipeline.py`)

3. Tensor Cache (`tensor_cache.py`)

4. Server Application (`app.py`)