[prompty] Refine stream output of prompty (#2862)

# Description - response format: text ``` --- model: api: chat configuration: type: azure_openai parameters: stream: true response_format: type: text inputs: question: type: string --- system: You are an AI assistant who helps people find information. # omit some content user: {{question}} ``` python code: ``` prompty_func = Flow.load(source=f"{PROMPTY_DIR}/prompty.prompty") stream_result = prompty_func(question="what is the result of 1+1?") response_content = [] for item in stream_result: response_content.append(item) ``` Return type is generator[str], like: ['The', ' result', ' of', ' ', '1', '+', '1', ' is', ' ', '2', '.', ' It', "'s", ' a', ' simple', ' addition', '!', ' 😊'] - response format is text with multi choice ``` --- model: api: chat configuration: type: azure_openai parameters: stream: true response_format: type: text n: 2 response: all inputs: question: type: string --- system: You are an AI assistant who helps people find information. # omit some content user: {{question}} ``` python code: ``` prompty_func = Flow.load(source=f"{PROMPTY_DIR}/prompty.prompty") stream_result = prompty_func(question="what is the result of 1+1?") response_content = [] for chunk in stream_result: if len(chunk.choices) > 0 and chunk.choices[0].delta.content: response_content.append(chunk.choices[0].delta.content) ``` Response type is openai.Stream - response format is json_object ``` --- model: api: chat configuration: type: azure_openai parameters: stream: true response_format: type: json_object inputs: question: type: string --- system: You are an AI assistant who helps people find information. Your structured response. Only accepts JSON format, likes below: {"name": customer_name, "answer": the answer content} # omit some content user: {{question}} ``` python code: ``` prompty_func = Flow.load(source=f"{PROMPTY_DIR}/prompty.prompty") result = prompty_func(question="what is the result of 1+1?") ``` Return json dict. like {"name": "John", "answer": 2} - response format is json_object with output ``` --- model: api: chat configuration: type: azure_openai parameters: stream: true response_format: type: json_object inputs: question: type: string outputs: answer: type: string --- system: You are an AI assistant who helps people find information. Your structured response. Only accepts JSON format, likes below: {"name": customer_name, "answer": the answer content} # omit some content user: {{question}} ``` python code: ``` prompty_func = Flow.load(source=f"{PROMPTY_DIR}/prompty.prompty") result = prompty_func(question="what is the result of 1+1?") ``` Return json dict. like {"answer": 2} - response format is json_object and response is all ``` --- model: api: chat configuration: type: azure_openai parameters: stream: true response_format: type: json_object n: 2 response: all inputs: question: type: string --- system: You are an AI assistant who helps people find information. Your structured response. Only accepts JSON format, likes below: {"name": customer_name, "answer": the answer content} # omit some content user: {{question}} ``` python code: ``` prompty_func = Flow.load(source=f"{PROMPTY_DIR}/prompty.prompty") stream_result = prompty_func(question="what is the result of 1+1?") response_content = [] for chunk in stream_result: if len(chunk.choices) > 0 and chunk.choices[0].delta.content: response_content.append(chunk.choices[0].delta.content) ``` Response type is openai.Stream # All Promptflow Contribution checklist: - [ ] **The pull request does not introduce [breaking changes].** - [ ] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [ ] **I have read the [contribution guidelines](../CONTRIBUTING.md).** - [ ] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [ ] Title of the pull request is clear and informative. - [ ] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [ ] Pull request includes test coverage for the included changes.
microsoft · Apr 18, 2024 · 5cf3aa1 · 5cf3aa1
1 parent d3a816f
commit 5cf3aa1
Show file tree

Hide file tree

Showing 5 changed files with 103 additions and 10 deletions.
diff --git a/src/promptflow-core/promptflow/core/_prompty_utils.py b/src/promptflow-core/promptflow/core/_prompty_utils.py
@@ -169,6 +169,21 @@ def format_llm_response(response, api, is_first_choice, response_format=None, st
     """
     Format LLM response
 
+    If is_first_choice is false, it will directly return LLM response.
+    If is_first_choice is true, behavior as blow:
+        response_format: type: text
+            - n: None/1/2
+                Return the first choice content. Return type is string.
+            - stream: True
+                Return generator list of first choice content. Return type is generator[str]
+        response_format: type: json_object
+            - n : None/1/2
+                Return json dict of the first choice. Return type is dict
+            - stream: True
+                Return json dict of the first choice. Return type is dict
+            - outputs
+                Extract corresponding output in the json dict to the first choice. Return type is dict.
+
     :param response: LLM response.
     :type response:
     :param api: API type of the LLM.
@@ -188,7 +203,7 @@ def format_llm_response(response, api, is_first_choice, response_format=None, st
     def format_choice(item):
         # response_format is one of text or json_object.
         # https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format
-        if isinstance(response_format, dict) and response_format.get("type", None) == "json_object":
+        if is_json_format:
             result_dict = json.loads(item)
             if not outputs:
                 return result_dict
@@ -202,9 +217,26 @@ def format_choice(item):
         # Return text format response
         return item
 
-    if not is_first_choice or streaming:
+    def format_stream(llm_response):
+        cur_index = None
+        for chunk in llm_response:
+            if len(chunk.choices) > 0 and chunk.choices[0].delta.content:
+                if cur_index is None:
+                    cur_index = chunk.choices[0].index
+                if cur_index != chunk.choices[0].index:
+                    return
+                yield chunk.choices[0].delta.content
+
+    if not is_first_choice:
         return response
 
+    is_json_format = isinstance(response_format, dict) and response_format.get("type", None) == "json_object"
+    if streaming:
+        if not is_json_format:
+            return format_stream(llm_response=response)
+        else:
+            content = "".join([item for item in format_stream(llm_response=response)])
+            return format_choice(content)
     if api == "completion":
         result = format_choice(response.choices[0].text)
     else:

diff --git a/src/promptflow-devkit/tests/sdk_cli_test/e2etests/test_prompty.py b/src/promptflow-devkit/tests/sdk_cli_test/e2etests/test_prompty.py
@@ -1,10 +1,12 @@
 import asyncio
 import json
 import os
+import types
 from pathlib import Path
 
 import pytest
 from _constants import PROMPTFLOW_ROOT
+from openai import Stream
 from openai.types.chat import ChatCompletion
 
 from promptflow._sdk._pf_client import PFClient
@@ -13,6 +15,7 @@
 from promptflow.core._flow import AsyncPrompty, Prompty
 from promptflow.core._model_configuration import AzureOpenAIModelConfiguration
 from promptflow.core._prompty_utils import convert_model_configuration_to_connection
+from promptflow.recording.record_mode import is_live, is_record, is_replay
 
 TEST_ROOT = PROMPTFLOW_ROOT / "tests"
 DATA_DIR = TEST_ROOT / "test_configs/datas"
@@ -227,21 +230,73 @@ def test_prompty_format_output(self, pf: PFClient):
             prompty(question="what is the result of 1+1?")
         assert "Cannot find invalid_output in response ['name', 'answer']" in ex.value.message
 
-        # Test stream output
+        # Test return all choices
+        prompty = Prompty.load(
+            source=f"{PROMPTY_DIR}/prompty_example.prompty", model={"parameters": {"n": 2}, "response": "all"}
+        )
+        result = prompty(question="what is the result of 1+1?")
+        assert isinstance(result, ChatCompletion)
+
+    def test_prompty_with_stream(self, pf: PFClient):
+        if is_live():
+            stream_type = Stream
+        elif is_record() or is_replay():
+            stream_type = types.GeneratorType
+        # Test text format with stream=true
         prompty = Prompty.load(source=f"{PROMPTY_DIR}/prompty_example.prompty", model={"parameters": {"stream": True}})
         result = prompty(question="what is the result of 1+1?")
-        result_content = ""
+        assert isinstance(result, types.GeneratorType)
+        response_contents = []
         for item in result:
-            if len(item.choices) > 0 and item.choices[0].delta.content:
-                result_content += item.choices[0].delta.content
-        assert "2" in result_content
+            response_contents.append(item)
+        assert "2" in "".join(response_contents)
 
-        # Test return all choices
+        # Test text format with multi choices and response=first
         prompty = Prompty.load(
-            source=f"{PROMPTY_DIR}/prompty_example.prompty", model={"parameters": {"n": 2}, "response": "all"}
+            source=f"{PROMPTY_DIR}/prompty_example.prompty", model={"parameters": {"stream": True, "n": 2}}
         )
         result = prompty(question="what is the result of 1+1?")
-        assert isinstance(result, ChatCompletion)
+        assert isinstance(result, types.GeneratorType)
+        response_contents = []
+        for item in result:
+            response_contents.append(item)
+        assert "2" in "".join(response_contents)
+
+        # Test text format with multi choices
+        prompty = Prompty.load(
+            source=f"{PROMPTY_DIR}/prompty_example.prompty",
+            model={"parameters": {"stream": True, "n": 2}, "response": "all"},
+        )
+        result = prompty(question="what is the result of 1+1?")
+
+        assert isinstance(result, stream_type)
+
+        # Test text format with stream=true, response=all
+        prompty = Prompty.load(
+            source=f"{PROMPTY_DIR}/prompty_example.prompty", model={"parameters": {"stream": True}, "response": "all"}
+        )
+        result = prompty(question="what is the result of 1+1?")
+        assert isinstance(result, stream_type)
+
+        # Test json format with stream=true
+        prompty = Prompty.load(
+            source=f"{PROMPTY_DIR}/prompty_example_with_json_format.prompty",
+            model={"parameters": {"n": 2, "stream": True}},
+        )
+        result = prompty(question="what is the result of 1+1?")
+        assert isinstance(result, dict)
+        assert result["answer"] == 2
+
+        # Test json format with outputs
+        prompty = Prompty.load(
+            source=f"{PROMPTY_DIR}/prompty_example_with_json_format.prompty",
+            model={"parameters": {"stream": True}},
+            outputs={"answer": {"type": "number"}},
+        )
+        result = prompty(question="what is the result of 1+1?")
+        assert isinstance(result, dict)
+        assert list(result.keys()) == ["answer"]
+        assert result["answer"] == 2
 
     @pytest.mark.skip(reason="Double check this test in python 3.9")
     def test_prompty_trace(self, pf: PFClient):

diff --git a/src/promptflow-recording/recordings/local/node_cache.shelve.bak b/src/promptflow-recording/recordings/local/node_cache.shelve.bak
@@ -102,3 +102,6 @@
 'c45029aaf963d638d7f184c5ecd9905f24b29f1a', (496640, 40930)
 'c26639a858156ff282cd2bcb4ce4db43167ec213', (537600, 1774)
 'd50861d6d33d3389d11be401ddb7528d6fdbe996', (539648, 2148)
+'3d5ce8929b569af5be85f2d6cf29494eca7318d9', (542208, 25728)
+'6dd5f4a090198cd640009db53e2403da31ba126a', (568320, 18625)
+'57a991472dd300efc84b638768fe2f87e7acb04c', (587264, 9897)
diff --git a/src/promptflow-recording/recordings/local/node_cache.shelve.dat b/src/promptflow-recording/recordings/local/node_cache.shelve.dat
diff --git a/src/promptflow-recording/recordings/local/node_cache.shelve.dir b/src/promptflow-recording/recordings/local/node_cache.shelve.dir
@@ -102,3 +102,6 @@
 'c45029aaf963d638d7f184c5ecd9905f24b29f1a', (496640, 40930)
 'c26639a858156ff282cd2bcb4ce4db43167ec213', (537600, 1774)
 'd50861d6d33d3389d11be401ddb7528d6fdbe996', (539648, 2148)
+'3d5ce8929b569af5be85f2d6cf29494eca7318d9', (542208, 25728)
+'6dd5f4a090198cd640009db53e2403da31ba126a', (568320, 18625)
+'57a991472dd300efc84b638768fe2f87e7acb04c', (587264, 9897)