Skip to content

Commit

Permalink
[Promptflow Serve] Benchmarking Framework To Measure Throughput of Pr…
Browse files Browse the repository at this point in the history
…omptflow Serve (microsoft#3486)

# Description

This PR introduces a throughput benchmarking framework for Promptflow
Serve which introduces tests for the following scenarios.

- WSGI + Flask + DAG + Sync Nodes 
- ASGI + FastApi + DAG + Async Nodes
- ASGI + FastApi + Flex Flow + Async Nodes

It uses a minimum example that has a flow as shown below. The scenario
covers 3 parallelly executing nodes (i.e. simulating guardrails) and
upon completion of those, one other node that calls an http endpoint
(i.e. simulating LLM or custom API).


![image](https://github.com/microsoft/promptflow/assets/16128850/1df317d2-c0d0-4f05-8f09-d88190de8601)

The tests are controlled by the configuration parameters in the
`benchmark/promptflow-serve/test_runner/settings.env` file.

The makefile located at `benchmark/promptflow-serve/makefile`

# All Promptflow Contribution checklist:
- [x] **The pull request does not introduce [breaking changes].**
- [x] **CHANGELOG is updated for new features, bug fixes or other
significant changes.**
- [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).**
- [x] **Create an issue and link to the pull request to get dedicated
review from promptflow team. Learn more: [suggested
workflow](../CONTRIBUTING.md#suggested-workflow).**

## General Guidelines and Best Practices
- [x] Title of the pull request is clear and informative.
- [x] There are a small number of commits, each of which have an
informative message. This means that previously merged commits do not
appear in the history of the PR. For more information on cleaning up the
commits in your PR, [see this
page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md).

### Testing Guidelines
- [ ] Pull request includes test coverage for the included changes.

---------

Signed-off-by: Dasith Wijes <[email protected]>
  • Loading branch information
dasiths authored Jul 2, 2024
1 parent 4450ec4 commit 8fb8cbe
Show file tree
Hide file tree
Showing 50 changed files with 1,803 additions and 2 deletions.
6 changes: 4 additions & 2 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@
".github/actions/**",
".github/pipelines/**",
".github/CODEOWNERS",
"src/promptflow-evals/tests/**"
"src/promptflow-evals/tests/**",
"benchmark/promptflow-serve/result-archive/**"
],
"words": [
"aoai",
Expand Down Expand Up @@ -240,7 +241,8 @@
"Machinal",
"azureopenaimodelconfiguration",
"openaimodelconfiguration",
"usecwd"
"usecwd",
"locustio"
],
"flagWords": [
"Prompt Flow"
Expand Down
3 changes: 3 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
"context": ".",
"dockerFile": "Dockerfile",
"runArgs": ["-v", "/var/run/docker.sock:/var/run/docker.sock"],
"remoteEnv": {
"HOST_PROJECT_PATH": "${localWorkspaceFolder}"
},
"customizations": {
"codespaces": {
"openFiles": ["README.md", "examples/README.md"]
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -197,3 +197,6 @@ src/promptflow-*/promptflow/__init__.py
# Eclipse project files
**/.project
**/.pydevproject

# benchmark results
benchmark/promptflow-serve/test_runner/locust-results/
97 changes: 97 additions & 0 deletions benchmark/promptflow-serve/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Introduction

This directory contains scripts to test the throughput scalability of various [PromptFlow](https://microsoft.github.io/promptflow/) flows, using sync/async HTTP calls. It contains:
- A mock API service ([FastAPI](https://fastapi.tiangolo.com/) + [uvicorn](https://www.uvicorn.org/)) and Docker file to run as a service;
- Three different PromptFlow flows which include a node to query the mock API service:
- A [FlexFlow](https://microsoft.github.io/promptflow/tutorials/flex-flow-quickstart.html)-based flow with an async call to the mock API service;
- Two [static DAG](https://microsoft.github.io/promptflow/tutorials/quickstart.html) flows, each which call the mock API service, one using an async call, the other sync;
- A set of bash and [Docker Compose](https://docs.docker.com/compose/) scripts to build and run each of the above services;
- A script to run [Locust](https://locust.io/) jobs to measure the scalability of each of the PF flow services.

# Contents

```
├── README.md (this README file)
├── makefile (Makefile with the commands to build and run tests)
├── mock_api (a mock API service which simply waits before returning JSON)
│ ├── Dockerfile
│ ├── main.py
│ └── requirements.txt
├── pf_flows (various PromptFlow flows which call the mock API service using sync/async HTTP calls)
│ ├── flex_async (async flexflow example)
│ │ ├── flow.flex.yaml
│ │ ├── flow.py
│ │ └── requirements.txt
│ ├── static_async (async static DAG example)
│ │ ├── chat.py
│ │ ├── flow.dag.yaml
│ │ └── requirements.txt
│ └── static_sync (sync static DAG example)
│ ├── chat.py
│ ├── flow.dag.yaml
│ └── requirements.txt
├── requirements.txt (pip requirements for developing the tests)
└── test_runner (scripts to perform scalability tests against each of the PF flows)
├── locust_results (this is where the locust results will be stored)
├── build.sh (builds the docker images for each of the services above)
├── docker-compose.yml (manages starting up all the docker-based services)
├── mock_locustfile.py (locust test spec for testing the capacity of the mock API service)
├── pf_locustfile.py (locust test spec for testing the capacity of the PF flow services)
├── run_locust.sh (locust runner used in the tests)
├── settings.env (env file with the configuration used in the tests)
└── test.sh (orchestrates running the tests)
```

# Preparing the environment

## Prerequisites

### Software

Build the provided devcontainer and use it for running tests.

### Hardware

A host machine with at least 8 vCPU threads.

## Building the services

- `make install-requirements`
- `make build`

This script will visit each of the service directories (`mock_api`, `pf_flows/flex_async`, `pf_flows/static_async`, and `pf_flows/static_sync`) and create docker images for each.

Once this is complete, you can verify the services were built with `docker image ls`, for example:
```
REPOSITORY TAG IMAGE ID CREATED SIZE
fastapi-wait-service latest 6bc9152b6b9b 32 minutes ago 184MB
pf-flex-async-service latest d14cc15f45ad 33 minutes ago 1.58GB
pf-static-sync-service latest 8b5ac2dac32c 34 minutes ago 1.58GB
pf-static-async-service latest ff2968d3ef11 34 minutes ago 1.58GB
```

To test each of the services, you can try:
- Mock API service: `curl "http://localhost:50001/"`
- Static DAG async PF service: `curl --request POST 'http://localhost:8081/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history": []}'`
- Static DAG sync PF service: `curl --request POST 'http://localhost:8082/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history": []}'`
- FlexFlow async PF service: `curl --request POST 'http://localhost:8083/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history": []}'`

## Running each of the throughput tests

The mock API service simply waits every time a request is made, and returns JSON after the wait has ended. The wait time is configurable, but set to 1 second in the docker compose script.

In order to test the throughput latency of PF flows which call this service, we first need to establish a baseline of throughput for this mock service. Once we have this, we would expect all PF flows to have the same or similar throughput latency as all they are programmed to do is call this service and return.

The `benchmark/promptflow-serve/makefile` supports four tests:
- `make test-mock`: Run the throughput tests on the mock API service to determine a baseline.
- `make test-staticsync`: Run the throughput tests on the PF static sync DAG flow service.
- `make test-staticasync`: Run the throughput tests on the PF static async DAG flow service.
- `make test-flexasync`: Run the throughput tests on the PF flex flow async service.

## Test parameters

They can be controlled in the `benchmark/promptflow-serve/test_runner/settings.env` file.

## Results

The results are stored in the `/locust-results` folder. There are interactive HTML reports which present the results as graphs as well.
20 changes: 20 additions & 0 deletions benchmark/promptflow-serve/makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
install-requirements:
pip install -r requirements.txt

build:
cd test_runner && ./build.sh

stop-all-tests:
cd test_runner && docker-compose down --remove-orphans

test-mock:
cd test_runner && ./test.sh mock

test-staticsync:
cd test_runner && ./test.sh staticsync

test-staticasync:
cd test_runner && ./test.sh staticasync

test-flexasync:
cd test_runner && ./test.sh flexasync
10 changes: 10 additions & 0 deletions benchmark/promptflow-serve/mock_api/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

EXPOSE 50001

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "50001"]
26 changes: 26 additions & 0 deletions benchmark/promptflow-serve/mock_api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import asyncio
import os
import random

from fastapi import FastAPI

random.seed(42)
app = FastAPI()


@app.get("/")
async def wait_and_return():

min_wait_time_sec = int(os.getenv("MIN_WAIT_TIME_SEC", 1))
max_wait_time_sec = int(os.getenv("MAX_WAIT_TIME_SEC", 5))

# generate a random number of seconds to sleep between min and max.
random_float = random.uniform(min_wait_time_sec, max_wait_time_sec)
await asyncio.sleep(random_float)

# return a message to say just how long the service waited for
return {
"total_time_sec": random_float,
"min_wait_time_sec": min_wait_time_sec,
"max_wait_time_sec": max_wait_time_sec,
}
2 changes: 2 additions & 0 deletions benchmark/promptflow-serve/mock_api/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
fastapi==0.111.0
uvicorn==0.30.1
5 changes: 5 additions & 0 deletions benchmark/promptflow-serve/pf_flows/flex_async/flow.flex.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
entry: flow:ChatFlow
environment:
# image: mcr.microsoft.com/azureml/promptflow/promptflow-python
python_requirements_txt: requirements.txt
60 changes: 60 additions & 0 deletions benchmark/promptflow-serve/pf_flows/flex_async/flow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import asyncio
import os
import time
from pathlib import Path

import aiohttp

BASE_DIR = Path(__file__).absolute().parent


class ChatFlow:
def __init__(self):
pass

async def __call__(self, question: str, chat_history: list) -> str: # noqa: B006

node_instance1 = Node()
node_instance2 = Node()
node_instance3 = Node()

# create a list of tasks
tasks = [
self.call_node(node_instance1),
self.call_node(node_instance2),
self.call_node(node_instance3)
]

# simulate calling parallel nodes
await asyncio.gather(*tasks)

chat_history = chat_history or []
start_time = time.time()

# make a call to the mock endpoint
url = os.getenv("MOCK_API_ENDPOINT", None)
if url is None:
raise RuntimeError("Failed to read MOCK_API_ENDPOINT env var.")

async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status == 200:
response_dict = await response.json()
end_time = time.time()
response_dict["pf_node_time_sec"] = end_time - start_time
response_dict["type"] = "pf_flex_async"
return response_dict
else:
raise RuntimeError(f"Failed call to {url}: {response.status}")

async def call_node(self, node_instance: any):
await node_instance()


class Node:
def __init__(self):
pass

async def __call__(self) -> str: # noqa: B006
await asyncio.sleep(0.25)
return "completed"
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
promptflow==1.12.0
promptflow[azure]==1.12.0
promptflow-tools==1.4.0
fastapi==0.111.0
uvicorn==0.30.1
aiohttp==3.9.5
27 changes: 27 additions & 0 deletions benchmark/promptflow-serve/pf_flows/static_async/chat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import os
import time

import aiohttp
from promptflow.core import tool


@tool
async def my_python_tool(node1: str, node2: str, node3: str) -> str:

start_time = time.time()

# make a call to the mock endpoint
url = os.getenv("MOCK_API_ENDPOINT", None)
if url is None:
raise RuntimeError("Failed to read MOCK_API_ENDPOINT env var.")

async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status == 200:
response_dict = await response.json()
end_time = time.time()
response_dict["pf_node_time_sec"] = end_time - start_time
response_dict["type"] = "pf_dag_async"
return response_dict
else:
raise RuntimeError(f"Failed call to {url}: {response.status}")
50 changes: 50 additions & 0 deletions benchmark/promptflow-serve/pf_flows/static_async/flow.dag.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
inputs:
chat_history:
type: list
default: []
question:
type: string
is_chat_input: true
default: What is ChatGPT?
outputs:
answer:
type: string
reference: ${chat.output}
is_chat_output: true
nodes:
- name: node1
type: python
inputs:
chat_history: ${inputs.chat_history}
question: ${inputs.question}
source:
type: code
path: node1.py
- name: node2
type: python
inputs:
chat_history: ${inputs.chat_history}
question: ${inputs.question}
source:
type: code
path: node2.py
- name: node3
type: python
inputs:
chat_history: ${inputs.chat_history}
question: ${inputs.question}
source:
type: code
path: node3.py
- name: chat
type: python
inputs:
node1: ${node1.output}
node2: ${node2.output}
node3: ${node3.output}
source:
type: code
path: chat.py
environment:
python_requirements_txt: requirements.txt
10 changes: 10 additions & 0 deletions benchmark/promptflow-serve/pf_flows/static_async/node1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import asyncio
from promptflow.core import tool


@tool
async def my_python_tool(chat_history: list, question: str) -> str:

# sleep for 250ms to simulate open ai call async
await asyncio.sleep(0.25)
return "completed"
10 changes: 10 additions & 0 deletions benchmark/promptflow-serve/pf_flows/static_async/node2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import asyncio
from promptflow.core import tool


@tool
async def my_python_tool(chat_history: list, question: str) -> str:

# sleep for 250ms to simulate open ai call async
await asyncio.sleep(0.25)
return "completed"
10 changes: 10 additions & 0 deletions benchmark/promptflow-serve/pf_flows/static_async/node3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import asyncio
from promptflow.core import tool


@tool
async def my_python_tool(chat_history: list, question: str) -> str:

# sleep for 250ms to simulate open ai call async
await asyncio.sleep(0.25)
return "completed"
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
promptflow==1.12.0
promptflow[azure]==1.12.0
promptflow-tools==1.4.0
fastapi==0.111.0
uvicorn==0.30.1
aiohttp==3.9.5
Loading

0 comments on commit 8fb8cbe

Please sign in to comment.