|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "8b45a82a-a0dd-489b-bba8-c34f8e5afb96", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# How to download feedback and examples from a test project\n", |
| 9 | + "\n", |
| 10 | + "When testing with Langsmith, all the traces, examples, and evaluation feedback are saved so you have a full audit of what happened.\n", |
| 11 | + "This way you can see the aggregate metrics of the test run and compare on an example by example basis. You can also download the run and evaluation result information\n", |
| 12 | + "to use in external reporting software.\n", |
| 13 | + "\n", |
| 14 | + "In this walkthrough, we will show how to export the feedback and examples from a Langsmith test project. The main steps are:\n", |
| 15 | + "\n", |
| 16 | + "1. Create a dataset\n", |
| 17 | + "2. Run testing\n", |
| 18 | + "3. Export feedback and examples\n", |
| 19 | + "\n", |
| 20 | + "## Setup\n", |
| 21 | + "\n", |
| 22 | + "Install langchain and any other dependencies for your chain. We will install pandas as well for this walkthrough to put the retrieved data in a dataframe." |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "code", |
| 27 | + "execution_count": null, |
| 28 | + "id": "214520b3-801a-43eb-9eb5-017c1c6a9107", |
| 29 | + "metadata": { |
| 30 | + "tags": [] |
| 31 | + }, |
| 32 | + "outputs": [], |
| 33 | + "source": [ |
| 34 | + "# %pip install -U langsmith langchain anthropic pandas --quiet" |
| 35 | + ] |
| 36 | + }, |
| 37 | + { |
| 38 | + "cell_type": "code", |
| 39 | + "execution_count": 1, |
| 40 | + "id": "e820af26-12aa-4f2c-9ffb-345e68dfc638", |
| 41 | + "metadata": { |
| 42 | + "tags": [] |
| 43 | + }, |
| 44 | + "outputs": [], |
| 45 | + "source": [ |
| 46 | + "import uuid\n", |
| 47 | + "import os\n", |
| 48 | + "\n", |
| 49 | + "unique_id = uuid.uuid4().hex[0:8]\n", |
| 50 | + "os.environ[\"LANGCHAIN_API_KEY\"] = \"\"\n", |
| 51 | + "os.environ[\"LANGCHAIN_ENDPOINT\"] = \"\"\n", |
| 52 | + "os.environ[\"LANGCHAIN_PROJECT\"] = f\"Retrieve feedback and examples {unique_id}\"" |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "markdown", |
| 57 | + "id": "33f813bc-14b0-4015-aed6-d4365f4b9047", |
| 58 | + "metadata": {}, |
| 59 | + "source": [ |
| 60 | + "## 1. Create a dataset\n", |
| 61 | + "\n", |
| 62 | + "We will create a simple KV dataset with a poem topic and a constraint letter (which the model should not use)." |
| 63 | + ] |
| 64 | + }, |
| 65 | + { |
| 66 | + "cell_type": "code", |
| 67 | + "execution_count": 2, |
| 68 | + "id": "6594fa55-8278-453e-b923-fb5eb18b9458", |
| 69 | + "metadata": { |
| 70 | + "tags": [] |
| 71 | + }, |
| 72 | + "outputs": [], |
| 73 | + "source": [ |
| 74 | + "from langsmith import Client\n", |
| 75 | + "\n", |
| 76 | + "client = Client()\n", |
| 77 | + "\n", |
| 78 | + "examples= [\n", |
| 79 | + " (\"roses\", \"o\"),\n", |
| 80 | + " (\"vikings\", \"v\"),\n", |
| 81 | + " (\"planet earth\", \"e\"),\n", |
| 82 | + " (\"Sirens of Titan\", \"t\"),\n", |
| 83 | + "]\n", |
| 84 | + "\n", |
| 85 | + "dataset_name = f\"Download Feedback and Examples {unique_id}\"\n", |
| 86 | + "dataset = client.create_dataset(dataset_name)\n", |
| 87 | + "\n", |
| 88 | + "for prompt, constraint in examples:\n", |
| 89 | + " client.create_example({\"input\": prompt, \"constraint\": constraint}, dataset_id=dataset.id, outputs={\"constraint\": constraint})" |
| 90 | + ] |
| 91 | + }, |
| 92 | + { |
| 93 | + "cell_type": "markdown", |
| 94 | + "id": "787a8a19-986c-4334-8ef6-08e884b45bcd", |
| 95 | + "metadata": {}, |
| 96 | + "source": [ |
| 97 | + "## 2. Run testing\n", |
| 98 | + "\n", |
| 99 | + "We will use a simple custom evaluator that checks whether the prediction contains the constraint letter." |
| 100 | + ] |
| 101 | + }, |
| 102 | + { |
| 103 | + "cell_type": "code", |
| 104 | + "execution_count": 3, |
| 105 | + "id": "df839dab-f0dc-435a-982d-2bcc501257ce", |
| 106 | + "metadata": { |
| 107 | + "tags": [] |
| 108 | + }, |
| 109 | + "outputs": [], |
| 110 | + "source": [ |
| 111 | + "from typing import Any\n", |
| 112 | + "from langchain.evaluation import StringEvaluator\n", |
| 113 | + "\n", |
| 114 | + "class ConstraintEvaluator(StringEvaluator):\n", |
| 115 | + " \n", |
| 116 | + " @property\n", |
| 117 | + " def requires_reference(self):\n", |
| 118 | + " return True\n", |
| 119 | + " \n", |
| 120 | + " def _evaluate_strings(self, prediction: str, reference: str, **kwargs: Any) -> dict:\n", |
| 121 | + " # Reference in this case is the letter that should not be present\n", |
| 122 | + " return {\n", |
| 123 | + " \"score\": bool(reference not in prediction),\n", |
| 124 | + " \"reasoning\": f\"prediction contains the letter {reference}\",\n", |
| 125 | + " }" |
| 126 | + ] |
| 127 | + }, |
| 128 | + { |
| 129 | + "cell_type": "code", |
| 130 | + "execution_count": 4, |
| 131 | + "id": "3dc523be-b157-44a6-a820-f2bb2d2757ba", |
| 132 | + "metadata": { |
| 133 | + "tags": [] |
| 134 | + }, |
| 135 | + "outputs": [ |
| 136 | + { |
| 137 | + "name": "stdout", |
| 138 | + "output_type": "stream", |
| 139 | + "text": [ |
| 140 | + "View the evaluation results for project 'test-kind-prose-61' at:\n", |
| 141 | + "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/9890dcfa-92af-42d0-9df2-aa3dccd13788\n", |
| 142 | + "[------------------------------------------------->] 4/4" |
| 143 | + ] |
| 144 | + } |
| 145 | + ], |
| 146 | + "source": [ |
| 147 | + "from langchain import chat_models, prompts, schema\n", |
| 148 | + "from langchain.smith import RunEvalConfig\n", |
| 149 | + "\n", |
| 150 | + "chain = (\n", |
| 151 | + " prompts.PromptTemplate.from_template(\"Write a poem about {input} without using the letter {constraint}. Respond directly with the poem with no explanation.\")\n", |
| 152 | + " | chat_models.ChatAnthropic()\n", |
| 153 | + " | schema.output_parser.StrOutputParser()\n", |
| 154 | + ")\n", |
| 155 | + "\n", |
| 156 | + "eval_config = RunEvalConfig(\n", |
| 157 | + " custom_evaluators=[ConstraintEvaluator()],\n", |
| 158 | + " input_key=\"input\",\n", |
| 159 | + ")\n", |
| 160 | + "\n", |
| 161 | + "test_results = client.run_on_dataset(\n", |
| 162 | + " dataset_name=dataset_name,\n", |
| 163 | + " llm_or_chain_factory=chain,\n", |
| 164 | + " evaluation=eval_config,\n", |
| 165 | + ")" |
| 166 | + ] |
| 167 | + }, |
| 168 | + { |
| 169 | + "cell_type": "markdown", |
| 170 | + "id": "75f3c533-8ee6-466b-9082-372710287873", |
| 171 | + "metadata": {}, |
| 172 | + "source": [ |
| 173 | + "## 3. Review the feedback and examples" |
| 174 | + ] |
| 175 | + }, |
| 176 | + { |
| 177 | + "cell_type": "markdown", |
| 178 | + "id": "8462317f-bfe6-4d0f-b418-a353d51c59da", |
| 179 | + "metadata": { |
| 180 | + "tags": [] |
| 181 | + }, |
| 182 | + "source": [ |
| 183 | + "If you want to directly use the results, you can easily access them in tabular format by calling `to_dataframe()` on the test_results. " |
| 184 | + ] |
| 185 | + }, |
| 186 | + { |
| 187 | + "cell_type": "code", |
| 188 | + "execution_count": 5, |
| 189 | + "id": "da614c30-63ab-4c7f-a366-583bb89c01b7", |
| 190 | + "metadata": { |
| 191 | + "tags": [] |
| 192 | + }, |
| 193 | + "outputs": [ |
| 194 | + { |
| 195 | + "data": { |
| 196 | + "text/html": [ |
| 197 | + "<div>\n", |
| 198 | + "<style scoped>\n", |
| 199 | + " .dataframe tbody tr th:only-of-type {\n", |
| 200 | + " vertical-align: middle;\n", |
| 201 | + " }\n", |
| 202 | + "\n", |
| 203 | + " .dataframe tbody tr th {\n", |
| 204 | + " vertical-align: top;\n", |
| 205 | + " }\n", |
| 206 | + "\n", |
| 207 | + " .dataframe thead th {\n", |
| 208 | + " text-align: right;\n", |
| 209 | + " }\n", |
| 210 | + "</style>\n", |
| 211 | + "<table border=\"1\" class=\"dataframe\">\n", |
| 212 | + " <thead>\n", |
| 213 | + " <tr style=\"text-align: right;\">\n", |
| 214 | + " <th></th>\n", |
| 215 | + " <th>ConstraintEvaluator</th>\n", |
| 216 | + " <th>input</th>\n", |
| 217 | + " <th>output</th>\n", |
| 218 | + " <th>reference</th>\n", |
| 219 | + " </tr>\n", |
| 220 | + " </thead>\n", |
| 221 | + " <tbody>\n", |
| 222 | + " <tr>\n", |
| 223 | + " <th>3e9a2c05-f7f5-4309-b755-abeb76719f26</th>\n", |
| 224 | + " <td>False</td>\n", |
| 225 | + " <td>{'input': 'Sirens of Titan', 'constraint': 't'}</td>\n", |
| 226 | + " <td>Here is a poem about Sirens of Titan without ...</td>\n", |
| 227 | + " <td>{'constraint': 't'}</td>\n", |
| 228 | + " </tr>\n", |
| 229 | + " <tr>\n", |
| 230 | + " <th>9fddeee7-9080-46c8-b280-405b4f3d18cb</th>\n", |
| 231 | + " <td>False</td>\n", |
| 232 | + " <td>{'input': 'planet earth', 'constraint': 'e'}</td>\n", |
| 233 | + " <td>Our orb of life, a vision grand,\\nWith oceans...</td>\n", |
| 234 | + " <td>{'constraint': 'e'}</td>\n", |
| 235 | + " </tr>\n", |
| 236 | + " <tr>\n", |
| 237 | + " <th>d74ea995-6ae6-488a-8da8-21695f8949fe</th>\n", |
| 238 | + " <td>False</td>\n", |
| 239 | + " <td>{'input': 'vikings', 'constraint': 'v'}</td>\n", |
| 240 | + " <td>Here is a poem about Vikings without using th...</td>\n", |
| 241 | + " <td>{'constraint': 'v'}</td>\n", |
| 242 | + " </tr>\n", |
| 243 | + " <tr>\n", |
| 244 | + " <th>e059d4b8-3079-40c7-a6b7-16347d4fc568</th>\n", |
| 245 | + " <td>False</td>\n", |
| 246 | + " <td>{'input': 'roses', 'constraint': 'o'}</td>\n", |
| 247 | + " <td>Here is a poem about roses without using the ...</td>\n", |
| 248 | + " <td>{'constraint': 'o'}</td>\n", |
| 249 | + " </tr>\n", |
| 250 | + " </tbody>\n", |
| 251 | + "</table>\n", |
| 252 | + "</div>" |
| 253 | + ], |
| 254 | + "text/plain": [ |
| 255 | + " ConstraintEvaluator \\\n", |
| 256 | + "3e9a2c05-f7f5-4309-b755-abeb76719f26 False \n", |
| 257 | + "9fddeee7-9080-46c8-b280-405b4f3d18cb False \n", |
| 258 | + "d74ea995-6ae6-488a-8da8-21695f8949fe False \n", |
| 259 | + "e059d4b8-3079-40c7-a6b7-16347d4fc568 False \n", |
| 260 | + "\n", |
| 261 | + " input \\\n", |
| 262 | + "3e9a2c05-f7f5-4309-b755-abeb76719f26 {'input': 'Sirens of Titan', 'constraint': 't'} \n", |
| 263 | + "9fddeee7-9080-46c8-b280-405b4f3d18cb {'input': 'planet earth', 'constraint': 'e'} \n", |
| 264 | + "d74ea995-6ae6-488a-8da8-21695f8949fe {'input': 'vikings', 'constraint': 'v'} \n", |
| 265 | + "e059d4b8-3079-40c7-a6b7-16347d4fc568 {'input': 'roses', 'constraint': 'o'} \n", |
| 266 | + "\n", |
| 267 | + " output \\\n", |
| 268 | + "3e9a2c05-f7f5-4309-b755-abeb76719f26 Here is a poem about Sirens of Titan without ... \n", |
| 269 | + "9fddeee7-9080-46c8-b280-405b4f3d18cb Our orb of life, a vision grand,\\nWith oceans... \n", |
| 270 | + "d74ea995-6ae6-488a-8da8-21695f8949fe Here is a poem about Vikings without using th... \n", |
| 271 | + "e059d4b8-3079-40c7-a6b7-16347d4fc568 Here is a poem about roses without using the ... \n", |
| 272 | + "\n", |
| 273 | + " reference \n", |
| 274 | + "3e9a2c05-f7f5-4309-b755-abeb76719f26 {'constraint': 't'} \n", |
| 275 | + "9fddeee7-9080-46c8-b280-405b4f3d18cb {'constraint': 'e'} \n", |
| 276 | + "d74ea995-6ae6-488a-8da8-21695f8949fe {'constraint': 'v'} \n", |
| 277 | + "e059d4b8-3079-40c7-a6b7-16347d4fc568 {'constraint': 'o'} " |
| 278 | + ] |
| 279 | + }, |
| 280 | + "execution_count": 5, |
| 281 | + "metadata": {}, |
| 282 | + "output_type": "execute_result" |
| 283 | + } |
| 284 | + ], |
| 285 | + "source": [ |
| 286 | + "test_results.to_dataframe()" |
| 287 | + ] |
| 288 | + }, |
| 289 | + { |
| 290 | + "cell_type": "markdown", |
| 291 | + "id": "fbb7ff41-d1a2-4c92-8c63-803a0cc34b9c", |
| 292 | + "metadata": {}, |
| 293 | + "source": [ |
| 294 | + "If you want to fetch the feedback and examples for a historic test project, you can use the SDK:" |
| 295 | + ] |
| 296 | + }, |
| 297 | + { |
| 298 | + "cell_type": "code", |
| 299 | + "execution_count": null, |
| 300 | + "id": "33a1c081-7cd6-408f-949d-f6897f4baf3c", |
| 301 | + "metadata": { |
| 302 | + "tags": [] |
| 303 | + }, |
| 304 | + "outputs": [], |
| 305 | + "source": [ |
| 306 | + "# Can be any previous test projects\n", |
| 307 | + "test_project = test_results['project_name']" |
| 308 | + ] |
| 309 | + }, |
| 310 | + { |
| 311 | + "cell_type": "code", |
| 312 | + "execution_count": null, |
| 313 | + "id": "b4426821-bae0-4702-86e9-c5d7bbaceb20", |
| 314 | + "metadata": { |
| 315 | + "tags": [] |
| 316 | + }, |
| 317 | + "outputs": [], |
| 318 | + "source": [ |
| 319 | + "import pandas as pd\n", |
| 320 | + "\n", |
| 321 | + "runs = client.list_runs(project_name=test_project, execution_order=1)\n", |
| 322 | + "\n", |
| 323 | + "df = pd.DataFrame(\n", |
| 324 | + " [\n", |
| 325 | + " {\n", |
| 326 | + " \"example_id\": r.reference_example_id,\n", |
| 327 | + " **r.inputs,\n", |
| 328 | + " **(r.outputs or {}),\n", |
| 329 | + " **{k: v for f in client.list_feedback(run_ids=[r.id]) for k, v in [(f\"{f.key}.score\", f.score), (f\"{f.key}.comment\", f.comment)]},\n", |
| 330 | + " \"reference\": client.read_example(r.reference_example_id).outputs\n", |
| 331 | + " }\n", |
| 332 | + " for r in runs\n", |
| 333 | + " ]\n", |
| 334 | + ")\n", |
| 335 | + "df" |
| 336 | + ] |
| 337 | + }, |
| 338 | + { |
| 339 | + "cell_type": "markdown", |
| 340 | + "id": "eba511e2-c9ec-4268-b655-5163fa882086", |
| 341 | + "metadata": {}, |
| 342 | + "source": [ |
| 343 | + "## Conclusion\n", |
| 344 | + "\n", |
| 345 | + "In this example we showed how to download feedback and examples from a test project. You can directly use the result object from the run or use the SDK to fetch the results and feedback.\n", |
| 346 | + "Use this to analyze further or to programmatically add result information to your existing reports." |
| 347 | + ] |
| 348 | + }, |
| 349 | + { |
| 350 | + "cell_type": "code", |
| 351 | + "execution_count": null, |
| 352 | + "id": "0f637c05-80e7-43be-b44e-2a337139a183", |
| 353 | + "metadata": {}, |
| 354 | + "outputs": [], |
| 355 | + "source": [] |
| 356 | + } |
| 357 | + ], |
| 358 | + "metadata": { |
| 359 | + "kernelspec": { |
| 360 | + "display_name": "Python 3 (ipykernel)", |
| 361 | + "language": "python", |
| 362 | + "name": "python3" |
| 363 | + }, |
| 364 | + "language_info": { |
| 365 | + "codemirror_mode": { |
| 366 | + "name": "ipython", |
| 367 | + "version": 3 |
| 368 | + }, |
| 369 | + "file_extension": ".py", |
| 370 | + "mimetype": "text/x-python", |
| 371 | + "name": "python", |
| 372 | + "nbconvert_exporter": "python", |
| 373 | + "pygments_lexer": "ipython3", |
| 374 | + "version": "3.11.2" |
| 375 | + } |
| 376 | + }, |
| 377 | + "nbformat": 4, |
| 378 | + "nbformat_minor": 5 |
| 379 | +} |
0 commit comments