forked from microsoft/promptflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Promptflow Serve] Benchmarking Framework To Measure Throughput of Pr…
…omptflow Serve (microsoft#3486) # Description This PR introduces a throughput benchmarking framework for Promptflow Serve which introduces tests for the following scenarios. - WSGI + Flask + DAG + Sync Nodes - ASGI + FastApi + DAG + Async Nodes - ASGI + FastApi + Flex Flow + Async Nodes It uses a minimum example that has a flow as shown below. The scenario covers 3 parallelly executing nodes (i.e. simulating guardrails) and upon completion of those, one other node that calls an http endpoint (i.e. simulating LLM or custom API).  The tests are controlled by the configuration parameters in the `benchmark/promptflow-serve/test_runner/settings.env` file. The makefile located at `benchmark/promptflow-serve/makefile` # All Promptflow Contribution checklist: - [x] **The pull request does not introduce [breaking changes].** - [x] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).** - [x] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [x] Title of the pull request is clear and informative. - [x] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [ ] Pull request includes test coverage for the included changes. --------- Signed-off-by: Dasith Wijes <[email protected]>
- Loading branch information
Showing
50 changed files
with
1,803 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# Introduction | ||
|
||
This directory contains scripts to test the throughput scalability of various [PromptFlow](https://microsoft.github.io/promptflow/) flows, using sync/async HTTP calls. It contains: | ||
- A mock API service ([FastAPI](https://fastapi.tiangolo.com/) + [uvicorn](https://www.uvicorn.org/)) and Docker file to run as a service; | ||
- Three different PromptFlow flows which include a node to query the mock API service: | ||
- A [FlexFlow](https://microsoft.github.io/promptflow/tutorials/flex-flow-quickstart.html)-based flow with an async call to the mock API service; | ||
- Two [static DAG](https://microsoft.github.io/promptflow/tutorials/quickstart.html) flows, each which call the mock API service, one using an async call, the other sync; | ||
- A set of bash and [Docker Compose](https://docs.docker.com/compose/) scripts to build and run each of the above services; | ||
- A script to run [Locust](https://locust.io/) jobs to measure the scalability of each of the PF flow services. | ||
|
||
# Contents | ||
|
||
``` | ||
├── README.md (this README file) | ||
├── makefile (Makefile with the commands to build and run tests) | ||
├── mock_api (a mock API service which simply waits before returning JSON) | ||
│ ├── Dockerfile | ||
│ ├── main.py | ||
│ └── requirements.txt | ||
├── pf_flows (various PromptFlow flows which call the mock API service using sync/async HTTP calls) | ||
│ ├── flex_async (async flexflow example) | ||
│ │ ├── flow.flex.yaml | ||
│ │ ├── flow.py | ||
│ │ └── requirements.txt | ||
│ ├── static_async (async static DAG example) | ||
│ │ ├── chat.py | ||
│ │ ├── flow.dag.yaml | ||
│ │ └── requirements.txt | ||
│ └── static_sync (sync static DAG example) | ||
│ ├── chat.py | ||
│ ├── flow.dag.yaml | ||
│ └── requirements.txt | ||
├── requirements.txt (pip requirements for developing the tests) | ||
└── test_runner (scripts to perform scalability tests against each of the PF flows) | ||
├── locust_results (this is where the locust results will be stored) | ||
├── build.sh (builds the docker images for each of the services above) | ||
├── docker-compose.yml (manages starting up all the docker-based services) | ||
├── mock_locustfile.py (locust test spec for testing the capacity of the mock API service) | ||
├── pf_locustfile.py (locust test spec for testing the capacity of the PF flow services) | ||
├── run_locust.sh (locust runner used in the tests) | ||
├── settings.env (env file with the configuration used in the tests) | ||
└── test.sh (orchestrates running the tests) | ||
``` | ||
|
||
# Preparing the environment | ||
|
||
## Prerequisites | ||
|
||
### Software | ||
|
||
Build the provided devcontainer and use it for running tests. | ||
|
||
### Hardware | ||
|
||
A host machine with at least 8 vCPU threads. | ||
|
||
## Building the services | ||
|
||
- `make install-requirements` | ||
- `make build` | ||
|
||
This script will visit each of the service directories (`mock_api`, `pf_flows/flex_async`, `pf_flows/static_async`, and `pf_flows/static_sync`) and create docker images for each. | ||
|
||
Once this is complete, you can verify the services were built with `docker image ls`, for example: | ||
``` | ||
REPOSITORY TAG IMAGE ID CREATED SIZE | ||
fastapi-wait-service latest 6bc9152b6b9b 32 minutes ago 184MB | ||
pf-flex-async-service latest d14cc15f45ad 33 minutes ago 1.58GB | ||
pf-static-sync-service latest 8b5ac2dac32c 34 minutes ago 1.58GB | ||
pf-static-async-service latest ff2968d3ef11 34 minutes ago 1.58GB | ||
``` | ||
|
||
To test each of the services, you can try: | ||
- Mock API service: `curl "http://localhost:50001/"` | ||
- Static DAG async PF service: `curl --request POST 'http://localhost:8081/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history": []}'` | ||
- Static DAG sync PF service: `curl --request POST 'http://localhost:8082/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history": []}'` | ||
- FlexFlow async PF service: `curl --request POST 'http://localhost:8083/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history": []}'` | ||
|
||
## Running each of the throughput tests | ||
|
||
The mock API service simply waits every time a request is made, and returns JSON after the wait has ended. The wait time is configurable, but set to 1 second in the docker compose script. | ||
|
||
In order to test the throughput latency of PF flows which call this service, we first need to establish a baseline of throughput for this mock service. Once we have this, we would expect all PF flows to have the same or similar throughput latency as all they are programmed to do is call this service and return. | ||
|
||
The `benchmark/promptflow-serve/makefile` supports four tests: | ||
- `make test-mock`: Run the throughput tests on the mock API service to determine a baseline. | ||
- `make test-staticsync`: Run the throughput tests on the PF static sync DAG flow service. | ||
- `make test-staticasync`: Run the throughput tests on the PF static async DAG flow service. | ||
- `make test-flexasync`: Run the throughput tests on the PF flex flow async service. | ||
|
||
## Test parameters | ||
|
||
They can be controlled in the `benchmark/promptflow-serve/test_runner/settings.env` file. | ||
|
||
## Results | ||
|
||
The results are stored in the `/locust-results` folder. There are interactive HTML reports which present the results as graphs as well. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
install-requirements: | ||
pip install -r requirements.txt | ||
|
||
build: | ||
cd test_runner && ./build.sh | ||
|
||
stop-all-tests: | ||
cd test_runner && docker-compose down --remove-orphans | ||
|
||
test-mock: | ||
cd test_runner && ./test.sh mock | ||
|
||
test-staticsync: | ||
cd test_runner && ./test.sh staticsync | ||
|
||
test-staticasync: | ||
cd test_runner && ./test.sh staticasync | ||
|
||
test-flexasync: | ||
cd test_runner && ./test.sh flexasync |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
FROM python:3.10-slim | ||
|
||
WORKDIR /app | ||
COPY requirements.txt . | ||
RUN pip install --no-cache-dir -r requirements.txt | ||
COPY . . | ||
|
||
EXPOSE 50001 | ||
|
||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "50001"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
import asyncio | ||
import os | ||
import random | ||
|
||
from fastapi import FastAPI | ||
|
||
random.seed(42) | ||
app = FastAPI() | ||
|
||
|
||
@app.get("/") | ||
async def wait_and_return(): | ||
|
||
min_wait_time_sec = int(os.getenv("MIN_WAIT_TIME_SEC", 1)) | ||
max_wait_time_sec = int(os.getenv("MAX_WAIT_TIME_SEC", 5)) | ||
|
||
# generate a random number of seconds to sleep between min and max. | ||
random_float = random.uniform(min_wait_time_sec, max_wait_time_sec) | ||
await asyncio.sleep(random_float) | ||
|
||
# return a message to say just how long the service waited for | ||
return { | ||
"total_time_sec": random_float, | ||
"min_wait_time_sec": min_wait_time_sec, | ||
"max_wait_time_sec": max_wait_time_sec, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
fastapi==0.111.0 | ||
uvicorn==0.30.1 |
5 changes: 5 additions & 0 deletions
5
benchmark/promptflow-serve/pf_flows/flex_async/flow.flex.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json | ||
entry: flow:ChatFlow | ||
environment: | ||
# image: mcr.microsoft.com/azureml/promptflow/promptflow-python | ||
python_requirements_txt: requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
import asyncio | ||
import os | ||
import time | ||
from pathlib import Path | ||
|
||
import aiohttp | ||
|
||
BASE_DIR = Path(__file__).absolute().parent | ||
|
||
|
||
class ChatFlow: | ||
def __init__(self): | ||
pass | ||
|
||
async def __call__(self, question: str, chat_history: list) -> str: # noqa: B006 | ||
|
||
node_instance1 = Node() | ||
node_instance2 = Node() | ||
node_instance3 = Node() | ||
|
||
# create a list of tasks | ||
tasks = [ | ||
self.call_node(node_instance1), | ||
self.call_node(node_instance2), | ||
self.call_node(node_instance3) | ||
] | ||
|
||
# simulate calling parallel nodes | ||
await asyncio.gather(*tasks) | ||
|
||
chat_history = chat_history or [] | ||
start_time = time.time() | ||
|
||
# make a call to the mock endpoint | ||
url = os.getenv("MOCK_API_ENDPOINT", None) | ||
if url is None: | ||
raise RuntimeError("Failed to read MOCK_API_ENDPOINT env var.") | ||
|
||
async with aiohttp.ClientSession() as session: | ||
async with session.get(url) as response: | ||
if response.status == 200: | ||
response_dict = await response.json() | ||
end_time = time.time() | ||
response_dict["pf_node_time_sec"] = end_time - start_time | ||
response_dict["type"] = "pf_flex_async" | ||
return response_dict | ||
else: | ||
raise RuntimeError(f"Failed call to {url}: {response.status}") | ||
|
||
async def call_node(self, node_instance: any): | ||
await node_instance() | ||
|
||
|
||
class Node: | ||
def __init__(self): | ||
pass | ||
|
||
async def __call__(self) -> str: # noqa: B006 | ||
await asyncio.sleep(0.25) | ||
return "completed" |
6 changes: 6 additions & 0 deletions
6
benchmark/promptflow-serve/pf_flows/flex_async/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
promptflow==1.12.0 | ||
promptflow[azure]==1.12.0 | ||
promptflow-tools==1.4.0 | ||
fastapi==0.111.0 | ||
uvicorn==0.30.1 | ||
aiohttp==3.9.5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import os | ||
import time | ||
|
||
import aiohttp | ||
from promptflow.core import tool | ||
|
||
|
||
@tool | ||
async def my_python_tool(node1: str, node2: str, node3: str) -> str: | ||
|
||
start_time = time.time() | ||
|
||
# make a call to the mock endpoint | ||
url = os.getenv("MOCK_API_ENDPOINT", None) | ||
if url is None: | ||
raise RuntimeError("Failed to read MOCK_API_ENDPOINT env var.") | ||
|
||
async with aiohttp.ClientSession() as session: | ||
async with session.get(url) as response: | ||
if response.status == 200: | ||
response_dict = await response.json() | ||
end_time = time.time() | ||
response_dict["pf_node_time_sec"] = end_time - start_time | ||
response_dict["type"] = "pf_dag_async" | ||
return response_dict | ||
else: | ||
raise RuntimeError(f"Failed call to {url}: {response.status}") |
50 changes: 50 additions & 0 deletions
50
benchmark/promptflow-serve/pf_flows/static_async/flow.dag.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json | ||
inputs: | ||
chat_history: | ||
type: list | ||
default: [] | ||
question: | ||
type: string | ||
is_chat_input: true | ||
default: What is ChatGPT? | ||
outputs: | ||
answer: | ||
type: string | ||
reference: ${chat.output} | ||
is_chat_output: true | ||
nodes: | ||
- name: node1 | ||
type: python | ||
inputs: | ||
chat_history: ${inputs.chat_history} | ||
question: ${inputs.question} | ||
source: | ||
type: code | ||
path: node1.py | ||
- name: node2 | ||
type: python | ||
inputs: | ||
chat_history: ${inputs.chat_history} | ||
question: ${inputs.question} | ||
source: | ||
type: code | ||
path: node2.py | ||
- name: node3 | ||
type: python | ||
inputs: | ||
chat_history: ${inputs.chat_history} | ||
question: ${inputs.question} | ||
source: | ||
type: code | ||
path: node3.py | ||
- name: chat | ||
type: python | ||
inputs: | ||
node1: ${node1.output} | ||
node2: ${node2.output} | ||
node3: ${node3.output} | ||
source: | ||
type: code | ||
path: chat.py | ||
environment: | ||
python_requirements_txt: requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import asyncio | ||
from promptflow.core import tool | ||
|
||
|
||
@tool | ||
async def my_python_tool(chat_history: list, question: str) -> str: | ||
|
||
# sleep for 250ms to simulate open ai call async | ||
await asyncio.sleep(0.25) | ||
return "completed" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import asyncio | ||
from promptflow.core import tool | ||
|
||
|
||
@tool | ||
async def my_python_tool(chat_history: list, question: str) -> str: | ||
|
||
# sleep for 250ms to simulate open ai call async | ||
await asyncio.sleep(0.25) | ||
return "completed" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import asyncio | ||
from promptflow.core import tool | ||
|
||
|
||
@tool | ||
async def my_python_tool(chat_history: list, question: str) -> str: | ||
|
||
# sleep for 250ms to simulate open ai call async | ||
await asyncio.sleep(0.25) | ||
return "completed" |
6 changes: 6 additions & 0 deletions
6
benchmark/promptflow-serve/pf_flows/static_async/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
promptflow==1.12.0 | ||
promptflow[azure]==1.12.0 | ||
promptflow-tools==1.4.0 | ||
fastapi==0.111.0 | ||
uvicorn==0.30.1 | ||
aiohttp==3.9.5 |
Oops, something went wrong.