Skip to content

Conversation

@BuffMcBigHuge
Copy link

@BuffMcBigHuge BuffMcBigHuge commented Mar 25, 2025

Introduction:

One of the primary limitations of building workflows within ComfyStream is the use of the Hidden Switch fork.

Many difficulties arise when particular node packs do not play well with the EmbeddedComfyClient. Careful testing and modifications to existing nodes is usually required to enable full functionality. Nonetheless, there are usually other issues that may come from dependency on the fork, such as delays or limitations with newer ComfyUI features, including native performance optimizations.

The primary issue is handling of multiple Comfy instances to brute force frame generation.

Objective:

I set out to replace the EmbeddedComfyClient with communication directly to running ComfyUI instances using the native ComfyUI API and Web Socket connection.

Method:

Sending Data: All data is sent via RESTful POST /prompt through the Comfy API. Custom nodes were added to support sending the input image as a base64 string to the prompt.
Receiving Data: Message events from the webhook are parsed, and data can be received via the native send_image handler to push as WebRTC frames. The comfyui-tooling-nodes inspired this via a Blob format with prefix similar to how Comfy sends previews to the UI. Upon successfully capturing the Blob, the prompt can then be called for the next subsequent frame.

I will be investigating sending tensors and receiving tensors instead of base64 string input and Web Socket blob output respectively. This will reduce CPU conversion overhead and allow for multiple datatypes.

Limitations:

It is obvious that this process is not as efficient as the Hidden Switch method of communicating with the ComfyStream tensor_cache directly, however it opens up new opportunities for parallelization through multi-inference gpu scaling as well as multi-gpu scaling, an avenue I'm investigating as a performance increase.

Note that this preliminary DRAFT is very early and the proof of concept was just demonstrated as functional. More work is to be done.

TODOs:

  • Handle the frame buffer better to minimize dropped frames and stabilize framerate.
  • Test a myriad of existing workflows used with ComfyStream
  • Reduce CPU usage
  • Investigate alternative lightweight strategies for transferring data to/from Comfy API (i.e. Tensors in JSON)
  • Test and optimize and compare to existing EmbeddedComfyClient method
  • Auto start ComfyUI workspaces, and test multi-inference on a single GPU and build a frame buffer handler to reduce jitter and guarantee frame timing delivery
  • Deployment planning, integration with ai-runner

Getting it Running:

  • Run the UI as normal
  • Run an instance of ComfyUI as such:
python main.py --listen --cuda-device 0 --fast --enable-cors-header="*" --port 8188 --preview-method none
  • Start the app_api.py file instead of app.py:
conda activate comfystream
python server/app_api.py --config-file configs/comfy.toml --max-frame-wait 1000 --log_level INFO

Note: Your servers can be defined in configs/comfy.toml. You can add as many servers as you like on single GPUs or multiple GPUs. Take a note of VRAM use.

Visual example of multi-Comfy instance processing:

Screen.Recording.2025-03-25.183747.mp4

@eliteprox
Copy link
Collaborator

For reference, I had to qualify the --enable-cors-header value with quotes when testing on linux:

python main.py --listen --cuda-device 0 --fast --enable-cors-header="*" --port 8188 --preview-method none

For the workers, --max-frame-skips was not a valid option, but was able to run in devcontainer with this:

python server/app_api.py --workspace /workspace/ComfyUI --config-file configs/comfy.toml

@BuffMcBigHuge
Copy link
Author

For reference, I had to qualify the --enable-cors-header value with quotes when testing on linux:

python main.py --listen --cuda-device 0 --fast --enable-cors-header="*" --port 8188 --preview-method none

For the workers, --max-frame-skips was not a valid option, but was able to run in devcontainer with this:

python server/app_api.py --workspace /workspace/ComfyUI --config-file configs/comfy.toml

Fixed!

…ediate step, moved prompt execution strategy to `execution_start` event, moved buffer to self variable to avoid reinitalization.
@BuffMcBigHuge
Copy link
Author

BuffMcBigHuge commented Apr 15, 2025

This work will now continue in ComfyUI native API integration with Spawn #130.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tech design for multi-threaded ComfyUI Inference

2 participants