[Promptflow Serve] Benchmarking Framework To Measure Throughput of Pr…

…omptflow Serve (microsoft#3486) # Description This PR introduces a throughput benchmarking framework for Promptflow Serve which introduces tests for the following scenarios. - WSGI + Flask + DAG + Sync Nodes - ASGI + FastApi + DAG + Async Nodes - ASGI + FastApi + Flex Flow + Async Nodes It uses a minimum example that has a flow as shown below. The scenario covers 3 parallelly executing nodes (i.e. simulating guardrails) and upon completion of those, one other node that calls an http endpoint (i.e. simulating LLM or custom API). ![image](https://github.com/microsoft/promptflow/assets/16128850/1df317d2-c0d0-4f05-8f09-d88190de8601) The tests are controlled by the configuration parameters in the `benchmark/promptflow-serve/test_runner/settings.env` file. The makefile located at `benchmark/promptflow-serve/makefile` # All Promptflow Contribution checklist: - [x] **The pull request does not introduce [breaking changes].** - [x] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).** - [x] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [x] Title of the pull request is clear and informative. - [x] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [ ] Pull request includes test coverage for the included changes. --------- Signed-off-by: Dasith Wijes <[email protected]>
Malagurti · Jul 2, 2024 · 8fb8cbe · 8fb8cbe
1 parent 4450ec4
commit 8fb8cbe
Show file tree

Hide file tree

Showing 50 changed files with 1,803 additions and 2 deletions.
diff --git a/.cspell.json b/.cspell.json
@@ -50,7 +50,8 @@
     ".github/actions/**",
     ".github/pipelines/**",
     ".github/CODEOWNERS",
-    "src/promptflow-evals/tests/**"
+    "src/promptflow-evals/tests/**",
+    "benchmark/promptflow-serve/result-archive/**"
   ],
   "words": [
     "aoai",
@@ -240,7 +241,8 @@
     "Machinal",
     "azureopenaimodelconfiguration",
     "openaimodelconfiguration",
-    "usecwd"
+    "usecwd",
+    "locustio"
   ],
   "flagWords": [
     "Prompt Flow"

diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -4,6 +4,9 @@
   "context": ".",
   "dockerFile": "Dockerfile",
   "runArgs": ["-v", "/var/run/docker.sock:/var/run/docker.sock"],
+  "remoteEnv": {
+    "HOST_PROJECT_PATH": "${localWorkspaceFolder}"
+  },
   "customizations": {
     "codespaces": {
       "openFiles": ["README.md", "examples/README.md"]

diff --git a/.gitignore b/.gitignore
@@ -197,3 +197,6 @@ src/promptflow-*/promptflow/__init__.py
 # Eclipse project files
 **/.project
 **/.pydevproject
+
+# benchmark results
+benchmark/promptflow-serve/test_runner/locust-results/
diff --git a/benchmark/promptflow-serve/README.md b/benchmark/promptflow-serve/README.md
@@ -0,0 +1,97 @@
+# Introduction
+
+This directory contains scripts to test the throughput scalability of various [PromptFlow](https://microsoft.github.io/promptflow/) flows, using sync/async HTTP calls. It contains:
+- A mock API service ([FastAPI](https://fastapi.tiangolo.com/) + [uvicorn](https://www.uvicorn.org/)) and Docker file to run as a service;
+- Three different PromptFlow flows which include a node to query the mock API service:
+  - A [FlexFlow](https://microsoft.github.io/promptflow/tutorials/flex-flow-quickstart.html)-based flow with an async call to the mock API service;
+  - Two [static DAG](https://microsoft.github.io/promptflow/tutorials/quickstart.html) flows, each which call the mock API service, one using an async call, the other sync;
+- A set of bash and [Docker Compose](https://docs.docker.com/compose/) scripts to build and run each of the above services;
+- A script to run [Locust](https://locust.io/) jobs to measure the scalability of each of the PF flow services.
+
+# Contents
+
+```
+├── README.md               (this README file)
+├── makefile                (Makefile with the commands to build and run tests)
+├── mock_api                (a mock API service which simply waits before returning JSON)
+│   ├── Dockerfile
+│   ├── main.py
+│   └── requirements.txt
+├── pf_flows                (various PromptFlow flows which call the mock API service using sync/async HTTP calls)
+│   ├── flex_async          (async flexflow example)
+│   │   ├── flow.flex.yaml
+│   │   ├── flow.py
+│   │   └── requirements.txt
+│   ├── static_async        (async static DAG example)
+│   │   ├── chat.py
+│   │   ├── flow.dag.yaml
+│   │   └── requirements.txt
+│   └── static_sync         (sync static DAG example)
+│       ├── chat.py
+│       ├── flow.dag.yaml
+│       └── requirements.txt
+├── requirements.txt        (pip requirements for developing the tests)
+└── test_runner        (scripts to perform scalability tests against each of the PF flows)
+    ├── locust_results      (this is where the locust results will be stored)
+    ├── build.sh            (builds the docker images for each of the services above)
+    ├── docker-compose.yml  (manages starting up all the docker-based services)
+    ├── mock_locustfile.py  (locust test spec for testing the capacity of the mock API service)
+    ├── pf_locustfile.py    (locust test spec for testing the capacity of the PF flow services)
+    ├── run_locust.sh       (locust runner used in the tests)
+    ├── settings.env        (env file with the configuration used in the tests)
+    └── test.sh             (orchestrates running the tests)
+```
+
+# Preparing the environment
+
+## Prerequisites
+
+### Software
+
+Build the provided devcontainer and use it for running tests.
+
+### Hardware
+
+A host machine with at least 8 vCPU threads.
+
+## Building the services
+
+- `make install-requirements`
+- `make build`
+
+This script will visit each of the service directories (`mock_api`, `pf_flows/flex_async`, `pf_flows/static_async`, and `pf_flows/static_sync`) and create docker images for each.
+
+Once this is complete, you can verify the services were built with `docker image ls`, for example:
+```
+REPOSITORY                 TAG       IMAGE ID       CREATED          SIZE
+fastapi-wait-service       latest    6bc9152b6b9b   32 minutes ago    184MB
+pf-flex-async-service      latest    d14cc15f45ad   33 minutes ago   1.58GB
+pf-static-sync-service     latest    8b5ac2dac32c   34 minutes ago   1.58GB
+pf-static-async-service    latest    ff2968d3ef11   34 minutes ago   1.58GB
+```
+
+To test each of the services, you can try:
+- Mock API service: `curl "http://localhost:50001/"`
+- Static DAG async PF service: `curl --request POST 'http://localhost:8081/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history":  []}'`
+- Static DAG sync PF service: `curl --request POST 'http://localhost:8082/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history":  []}'`
+- FlexFlow async PF service: `curl --request POST 'http://localhost:8083/score' --header 'Content-Type: application/json' --data '{"question": "Test question", "chat_history":  []}'`
+
+## Running each of the throughput tests
+
+The mock API service simply waits every time a request is made, and returns JSON after the wait has ended. The wait time is configurable, but set to 1 second in the docker compose script.
+
+In order to test the throughput latency of PF flows which call this service, we first need to establish a baseline of throughput for this mock service. Once we have this, we would expect all PF flows to have the same or similar throughput latency as all they are programmed to do is call this service and return.
+
+The `benchmark/promptflow-serve/makefile` supports four tests:
+- `make test-mock`: Run the throughput tests on the mock API service to determine a baseline.
+- `make test-staticsync`: Run the throughput tests on the PF static sync DAG flow service.
+- `make test-staticasync`: Run the throughput tests on the PF static async DAG flow service.
+- `make test-flexasync`: Run the throughput tests on the PF flex flow async service.
+
+## Test parameters
+
+They can be controlled in the `benchmark/promptflow-serve/test_runner/settings.env` file.
+
+## Results
+
+The results are stored in the `/locust-results` folder. There are interactive HTML reports which present the results as graphs as well.
diff --git a/benchmark/promptflow-serve/makefile b/benchmark/promptflow-serve/makefile
@@ -0,0 +1,20 @@
+install-requirements:
+	pip install -r requirements.txt
+
+build:
+	cd test_runner && ./build.sh
+
+stop-all-tests:
+	cd test_runner && docker-compose down --remove-orphans
+
+test-mock:
+	cd test_runner && ./test.sh mock
+
+test-staticsync:
+	cd test_runner && ./test.sh staticsync
+
+test-staticasync:
+	cd test_runner && ./test.sh staticasync
+
+test-flexasync:
+	cd test_runner && ./test.sh flexasync
diff --git a/benchmark/promptflow-serve/mock_api/Dockerfile b/benchmark/promptflow-serve/mock_api/Dockerfile
@@ -0,0 +1,10 @@
+FROM python:3.10-slim
+
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+
+EXPOSE 50001
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "50001"]
diff --git a/benchmark/promptflow-serve/mock_api/main.py b/benchmark/promptflow-serve/mock_api/main.py
@@ -0,0 +1,26 @@
+import asyncio
+import os
+import random
+
+from fastapi import FastAPI
+
+random.seed(42)
+app = FastAPI()
+
+
+@app.get("/")
+async def wait_and_return():
+
+    min_wait_time_sec = int(os.getenv("MIN_WAIT_TIME_SEC", 1))
+    max_wait_time_sec = int(os.getenv("MAX_WAIT_TIME_SEC", 5))
+
+    # generate a random number of seconds to sleep between min and max.
+    random_float = random.uniform(min_wait_time_sec, max_wait_time_sec)
+    await asyncio.sleep(random_float)
+
+    # return a message to say just how long the service waited for
+    return {
+        "total_time_sec": random_float,
+        "min_wait_time_sec": min_wait_time_sec,
+        "max_wait_time_sec": max_wait_time_sec,
+    }
diff --git a/benchmark/promptflow-serve/mock_api/requirements.txt b/benchmark/promptflow-serve/mock_api/requirements.txt
@@ -0,0 +1,2 @@
+fastapi==0.111.0
+uvicorn==0.30.1
diff --git a/benchmark/promptflow-serve/pf_flows/flex_async/flow.flex.yaml b/benchmark/promptflow-serve/pf_flows/flex_async/flow.flex.yaml
@@ -0,0 +1,5 @@
+$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
+entry: flow:ChatFlow
+environment:
+  # image: mcr.microsoft.com/azureml/promptflow/promptflow-python
+  python_requirements_txt: requirements.txt
diff --git a/benchmark/promptflow-serve/pf_flows/flex_async/flow.py b/benchmark/promptflow-serve/pf_flows/flex_async/flow.py
@@ -0,0 +1,60 @@
+import asyncio
+import os
+import time
+from pathlib import Path
+
+import aiohttp
+
+BASE_DIR = Path(__file__).absolute().parent
+
+
+class ChatFlow:
+    def __init__(self):
+        pass
+
+    async def __call__(self, question: str, chat_history: list) -> str:  # noqa: B006
+
+        node_instance1 = Node()
+        node_instance2 = Node()
+        node_instance3 = Node()
+
+        # create a list of tasks
+        tasks = [
+            self.call_node(node_instance1),
+            self.call_node(node_instance2),
+            self.call_node(node_instance3)
+        ]
+
+        # simulate calling parallel nodes
+        await asyncio.gather(*tasks)
+
+        chat_history = chat_history or []
+        start_time = time.time()
+
+        # make a call to the mock endpoint
+        url = os.getenv("MOCK_API_ENDPOINT", None)
+        if url is None:
+            raise RuntimeError("Failed to read MOCK_API_ENDPOINT env var.")
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(url) as response:
+                if response.status == 200:
+                    response_dict = await response.json()
+                    end_time = time.time()
+                    response_dict["pf_node_time_sec"] = end_time - start_time
+                    response_dict["type"] = "pf_flex_async"
+                    return response_dict
+                else:
+                    raise RuntimeError(f"Failed call to {url}: {response.status}")
+
+    async def call_node(self, node_instance: any):
+        await node_instance()
+
+
+class Node:
+    def __init__(self):
+        pass
+
+    async def __call__(self) -> str:  # noqa: B006
+        await asyncio.sleep(0.25)
+        return "completed"
diff --git a/benchmark/promptflow-serve/pf_flows/flex_async/requirements.txt b/benchmark/promptflow-serve/pf_flows/flex_async/requirements.txt
@@ -0,0 +1,6 @@
+promptflow==1.12.0
+promptflow[azure]==1.12.0
+promptflow-tools==1.4.0
+fastapi==0.111.0
+uvicorn==0.30.1
+aiohttp==3.9.5
diff --git a/benchmark/promptflow-serve/pf_flows/static_async/chat.py b/benchmark/promptflow-serve/pf_flows/static_async/chat.py
@@ -0,0 +1,27 @@
+import os
+import time
+
+import aiohttp
+from promptflow.core import tool
+
+
+@tool
+async def my_python_tool(node1: str, node2: str, node3: str) -> str:
+
+    start_time = time.time()
+
+    # make a call to the mock endpoint
+    url = os.getenv("MOCK_API_ENDPOINT", None)
+    if url is None:
+        raise RuntimeError("Failed to read MOCK_API_ENDPOINT env var.")
+
+    async with aiohttp.ClientSession() as session:
+        async with session.get(url) as response:
+            if response.status == 200:
+                response_dict = await response.json()
+                end_time = time.time()
+                response_dict["pf_node_time_sec"] = end_time - start_time
+                response_dict["type"] = "pf_dag_async"
+                return response_dict
+            else:
+                raise RuntimeError(f"Failed call to {url}: {response.status}")
diff --git a/benchmark/promptflow-serve/pf_flows/static_async/flow.dag.yaml b/benchmark/promptflow-serve/pf_flows/static_async/flow.dag.yaml
@@ -0,0 +1,50 @@
+$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
+inputs:
+  chat_history:
+    type: list
+    default: []
+  question:
+    type: string
+    is_chat_input: true
+    default: What is ChatGPT?
+outputs:
+  answer:
+    type: string
+    reference: ${chat.output}
+    is_chat_output: true
+nodes:
+- name: node1
+  type: python
+  inputs:
+    chat_history: ${inputs.chat_history}
+    question: ${inputs.question}
+  source:
+    type: code
+    path: node1.py
+- name: node2
+  type: python
+  inputs:
+    chat_history: ${inputs.chat_history}
+    question: ${inputs.question}
+  source:
+    type: code
+    path: node2.py
+- name: node3
+  type: python
+  inputs:
+    chat_history: ${inputs.chat_history}
+    question: ${inputs.question}
+  source:
+    type: code
+    path: node3.py
+- name: chat
+  type: python
+  inputs:
+    node1: ${node1.output}
+    node2: ${node2.output}
+    node3: ${node3.output}
+  source:
+    type: code
+    path: chat.py
+environment:
+    python_requirements_txt: requirements.txt
diff --git a/benchmark/promptflow-serve/pf_flows/static_async/node1.py b/benchmark/promptflow-serve/pf_flows/static_async/node1.py
@@ -0,0 +1,10 @@
+import asyncio
+from promptflow.core import tool
+
+
+@tool
+async def my_python_tool(chat_history: list, question: str) -> str:
+
+    # sleep for 250ms to simulate open ai call async
+    await asyncio.sleep(0.25)
+    return "completed"
diff --git a/benchmark/promptflow-serve/pf_flows/static_async/node2.py b/benchmark/promptflow-serve/pf_flows/static_async/node2.py
@@ -0,0 +1,10 @@
+import asyncio
+from promptflow.core import tool
+
+
+@tool
+async def my_python_tool(chat_history: list, question: str) -> str:
+
+    # sleep for 250ms to simulate open ai call async
+    await asyncio.sleep(0.25)
+    return "completed"
diff --git a/benchmark/promptflow-serve/pf_flows/static_async/node3.py b/benchmark/promptflow-serve/pf_flows/static_async/node3.py
@@ -0,0 +1,10 @@
+import asyncio
+from promptflow.core import tool
+
+
+@tool
+async def my_python_tool(chat_history: list, question: str) -> str:
+
+    # sleep for 250ms to simulate open ai call async
+    await asyncio.sleep(0.25)
+    return "completed"
diff --git a/benchmark/promptflow-serve/pf_flows/static_async/requirements.txt b/benchmark/promptflow-serve/pf_flows/static_async/requirements.txt
@@ -0,0 +1,6 @@
+promptflow==1.12.0
+promptflow[azure]==1.12.0
+promptflow-tools==1.4.0
+fastapi==0.111.0
+uvicorn==0.30.1
+aiohttp==3.9.5