Skip to content

Commit 198b883

Browse files
gary-huangYun-Kim
andauthored
chore(llmobs): make expected output optional (#14208)
- fix case where updating a record with expected_output to no expected_output (null) - update now only updates the specified fields and checks that it is a valid update with valid fields - extend timeouts for experiments calls to overcome read timeout operations pulling dataset with [empty "expected_column"](https://app.datadoghq.com/llm/datasets/3926711e-10a7-4f4e-b29f-6de4ed1faca8?page=1) fails before before <img width="1610" height="196" alt="image" src="https://github.com/user-attachments/assets/c7d0f4d1-02ac-44d5-b86d-0804ca086f9f" /> after <img width="1030" height="410" alt="image" src="https://github.com/user-attachments/assets/6a188c8a-bd34-4aec-8d27-f57ea3e7377f" /> creating dataset before <img width="1574" height="237" alt="image" src="https://github.com/user-attachments/assets/232df808-bbc0-494a-adc7-6fb6ee37022c" /> after <img width="919" height="677" alt="image" src="https://github.com/user-attachments/assets/3e13461a-ecad-4d52-b9f5-1435e197f961" /> append, update, and delete would result in errors on push <img width="1571" height="180" alt="image" src="https://github.com/user-attachments/assets/902e5640-82bd-4475-8659-76558c02a273" /> All those cases succeed after this change A [large dataset pull](https://app.datadoghq.com/llm/datasets/3926711e-10a7-4f4e-b29f-6de4ed1faca8?page=1) has been tested to work with the extended timeout, we will bump the timeout as needed, and possibly make it configurable in the future ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Yun Kim <[email protected]>
1 parent 280be76 commit 198b883

File tree

90 files changed

+1984
-519
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+1984
-519
lines changed

ddtrace/llmobs/_experiment.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,11 @@ def push(self) -> None:
141141
self._updated_record_ids = []
142142

143143
def update(self, index: int, record: DatasetRecordRaw) -> None:
144+
if all(k not in record for k in ("input_data", "expected_output", "metadata")):
145+
raise ValueError(
146+
"invalid update, record should contain at least one of "
147+
"input_data, expected_output, or metadata to update"
148+
)
144149
record_id = self._records[index]["record_id"]
145150
self._updated_record_ids.append(record_id)
146151
self._records[index] = {**record, "record_id": record_id}

ddtrace/llmobs/_llmobs.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -611,7 +611,9 @@ def pull_dataset(cls, name: str) -> Dataset:
611611
return ds
612612

613613
@classmethod
614-
def create_dataset(cls, name: str, description: str, records: List[DatasetRecord] = []) -> Dataset:
614+
def create_dataset(cls, name: str, description: str, records: Optional[List[DatasetRecord]] = None) -> Dataset:
615+
if records is None:
616+
records = []
615617
ds = cls._instance._dne_client.dataset_create(name, description)
616618
for r in records:
617619
ds.append(r)
@@ -625,11 +627,15 @@ def create_dataset_from_csv(
625627
csv_path: str,
626628
dataset_name: str,
627629
input_data_columns: List[str],
628-
expected_output_columns: List[str],
629-
metadata_columns: List[str] = [],
630+
expected_output_columns: Optional[List[str]] = None,
631+
metadata_columns: Optional[List[str]] = None,
630632
csv_delimiter: str = ",",
631633
description="",
632634
) -> Dataset:
635+
if expected_output_columns is None:
636+
expected_output_columns = []
637+
if metadata_columns is None:
638+
metadata_columns = []
633639
ds = cls._instance._dne_client.dataset_create(dataset_name, description)
634640

635641
# Store the original field size limit to restore it later

ddtrace/llmobs/_writer.py

Lines changed: 30 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@ class LLMObsExperimentsClient(BaseLLMObsWriter):
297297
EVP_SUBDOMAIN_HEADER_VALUE = EXP_SUBDOMAIN_NAME
298298
AGENTLESS_BASE_URL = AGENTLESS_EXP_BASE_URL
299299
ENDPOINT = ""
300+
TIMEOUT = 5.0
300301

301302
def request(self, method: str, path: str, body: JSONType = None) -> Response:
302303
headers = {
@@ -308,7 +309,7 @@ def request(self, method: str, path: str, body: JSONType = None) -> Response:
308309
headers[EVP_SUBDOMAIN_HEADER_NAME] = self.EVP_SUBDOMAIN_HEADER_VALUE
309310

310311
encoded_body = json.dumps(body).encode("utf-8") if body else b""
311-
conn = get_connection(self._intake)
312+
conn = get_connection(url=self._intake, timeout=self.TIMEOUT)
312313
try:
313314
url = self._intake + self._endpoint + path
314315
logger.debug("requesting %s", url)
@@ -353,30 +354,40 @@ def dataset_create(self, name: str, description: str) -> Dataset:
353354
curr_version = response_data["data"]["attributes"]["current_version"]
354355
return Dataset(name, dataset_id, [], description, curr_version, _dne_client=self)
355356

357+
@staticmethod
358+
def _get_record_json(record: Union[DatasetRecord, DatasetRecordRaw], is_update: bool) -> JSONType:
359+
# for now, if a user wants to "erase" the value of expected_output, they are expected to
360+
# set expected_output to None, and we serialize that as empty string to indicate this to BE
361+
expected_output: JSONType = None
362+
if "expected_output" in record:
363+
expected_output = "" if record["expected_output"] is None else record["expected_output"]
364+
365+
# for now, if a user wants to "erase" the value of metadata, they are expected to
366+
# set metadata to None, and we serialize that as an empty map to indicate this to BE
367+
metadata: JSONType = None
368+
if "metadata" in record:
369+
metadata = {} if record["metadata"] is None else record["metadata"]
370+
371+
rj: JSONType = {
372+
"input": cast(Dict[str, JSONType], record.get("input_data")),
373+
"expected_output": expected_output,
374+
"metadata": metadata,
375+
}
376+
377+
if is_update:
378+
rj["id"] = record["record_id"] # type: ignore
379+
380+
return rj
381+
356382
def dataset_batch_update(
357383
self,
358384
dataset_id: str,
359385
insert_records: List[DatasetRecordRaw],
360386
update_records: List[DatasetRecord],
361387
delete_record_ids: List[str],
362388
) -> Tuple[int, List[str]]:
363-
irs: JSONType = [
364-
{
365-
"input": cast(Dict[str, JSONType], r["input_data"]),
366-
"expected_output": r["expected_output"],
367-
"metadata": r.get("metadata", {}),
368-
}
369-
for r in insert_records
370-
]
371-
urs: JSONType = [
372-
{
373-
"input": cast(Dict[str, JSONType], r["input_data"]),
374-
"expected_output": r["expected_output"],
375-
"metadata": r.get("metadata", {}),
376-
"id": r["record_id"],
377-
}
378-
for r in update_records
379-
]
389+
irs: JSONType = [self._get_record_json(r, False) for r in insert_records]
390+
urs: JSONType = [self._get_record_json(r, True) for r in update_records]
380391
path = f"/api/unstable/llm-obs/v1/datasets/{dataset_id}/batch_update"
381392
body: JSONType = {
382393
"data": {
@@ -428,7 +439,7 @@ def dataset_get_with_records(self, name: str) -> Dataset:
428439
{
429440
"record_id": record["id"],
430441
"input_data": attrs["input"],
431-
"expected_output": attrs["expected_output"],
442+
"expected_output": attrs.get("expected_output"),
432443
"metadata": attrs.get("metadata", {}),
433444
}
434445
)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
interactions:
2+
- request:
3+
body: '{"data": {"type": "datasets", "id": "2c079a59-06d7-4592-adcc-8f4124c0ea3b",
4+
"attributes": {"insert_records": [], "update_records": [{"input": "A", "expected_output":
5+
null, "metadata": null, "id": "371eb8d2-f5f3-44e6-a8bb-64169a656a7b"}], "delete_records":
6+
[]}}}'
7+
headers:
8+
Accept:
9+
- '*/*'
10+
? !!python/object/apply:multidict._multidict.istr
11+
- Accept-Encoding
12+
: - identity
13+
Connection:
14+
- keep-alive
15+
Content-Length:
16+
- '261'
17+
? !!python/object/apply:multidict._multidict.istr
18+
- Content-Type
19+
: - application/json
20+
User-Agent:
21+
- python-requests/2.32.3
22+
method: POST
23+
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/2c079a59-06d7-4592-adcc-8f4124c0ea3b/batch_update
24+
response:
25+
body:
26+
string: '{"data":[{"id":"5d36809c-3741-45a7-8f32-67cf24d29e85","type":"datasets","attributes":{"author":{"id":"de473b30-eb9f-11e9-a77a-c7405862b8bd"},"created_at":"2025-08-06T20:51:41.035546097Z","dataset_id":"2c079a59-06d7-4592-adcc-8f4124c0ea3b","expected_output":{"answer":"Paris"},"input":"A","metadata":{"difficulty":"easy"},"updated_at":"2025-08-06T20:51:41.035546171Z","version":2}}]}'
27+
headers:
28+
content-length:
29+
- '382'
30+
content-security-policy:
31+
- frame-ancestors 'self'; report-uri https://logs.browser-intake-datadoghq.com/api/v2/logs?dd-api-key=pube4f163c23bbf91c16b8f57f56af9fc58&dd-evp-origin=content-security-policy&ddsource=csp-report&ddtags=site%3Adatadoghq.com
32+
content-type:
33+
- application/vnd.api+json
34+
date:
35+
- Wed, 06 Aug 2025 20:51:41 GMT
36+
strict-transport-security:
37+
- max-age=31536000; includeSubDomains; preload
38+
vary:
39+
- Accept-Encoding
40+
x-content-type-options:
41+
- nosniff
42+
x-frame-options:
43+
- SAMEORIGIN
44+
status:
45+
code: 200
46+
message: OK
47+
version: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
interactions:
2+
- request:
3+
body: '{"data": {"type": "datasets", "id": "2c079a59-06d7-4592-adcc-8f4124c0ea3b",
4+
"attributes": {"insert_records": [{"input": {"prompt": "What is the capital
5+
of France?"}, "expected_output": {"answer": "Paris"}, "metadata": {"difficulty":
6+
"easy"}}], "update_records": [], "delete_records": []}}}'
7+
headers:
8+
Accept:
9+
- '*/*'
10+
? !!python/object/apply:multidict._multidict.istr
11+
- Accept-Encoding
12+
: - identity
13+
Connection:
14+
- keep-alive
15+
Content-Length:
16+
- '289'
17+
? !!python/object/apply:multidict._multidict.istr
18+
- Content-Type
19+
: - application/json
20+
User-Agent:
21+
- python-requests/2.32.3
22+
method: POST
23+
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/2c079a59-06d7-4592-adcc-8f4124c0ea3b/batch_update
24+
response:
25+
body:
26+
string: '{"data":[{"id":"371eb8d2-f5f3-44e6-a8bb-64169a656a7b","type":"datasets","attributes":{"author":{"id":"de473b30-eb9f-11e9-a77a-c7405862b8bd"},"created_at":"2025-08-06T20:51:38.394638367Z","dataset_id":"2c079a59-06d7-4592-adcc-8f4124c0ea3b","expected_output":{"answer":"Paris"},"input":{"prompt":"What
27+
is the capital of France?"},"metadata":{"difficulty":"easy"},"updated_at":"2025-08-06T20:51:38.394638367Z","version":1}}]}'
28+
headers:
29+
content-length:
30+
- '422'
31+
content-security-policy:
32+
- frame-ancestors 'self'; report-uri https://logs.browser-intake-datadoghq.com/api/v2/logs?dd-api-key=pube4f163c23bbf91c16b8f57f56af9fc58&dd-evp-origin=content-security-policy&ddsource=csp-report&ddtags=site%3Adatadoghq.com
33+
content-type:
34+
- application/vnd.api+json
35+
date:
36+
- Wed, 06 Aug 2025 20:51:38 GMT
37+
strict-transport-security:
38+
- max-age=31536000; includeSubDomains; preload
39+
vary:
40+
- Accept-Encoding
41+
x-content-type-options:
42+
- nosniff
43+
x-frame-options:
44+
- SAMEORIGIN
45+
status:
46+
code: 200
47+
message: OK
48+
version: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
interactions:
2+
- request:
3+
body: null
4+
headers:
5+
Accept:
6+
- '*/*'
7+
? !!python/object/apply:multidict._multidict.istr
8+
- Accept-Encoding
9+
: - identity
10+
Connection:
11+
- keep-alive
12+
? !!python/object/apply:multidict._multidict.istr
13+
- Content-Length
14+
: - '0'
15+
? !!python/object/apply:multidict._multidict.istr
16+
- Content-Type
17+
: - application/json
18+
User-Agent:
19+
- python-requests/2.32.3
20+
method: GET
21+
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/2c079a59-06d7-4592-adcc-8f4124c0ea3b/records
22+
response:
23+
body:
24+
string: '{"data":[{"id":"5d36809c-3741-45a7-8f32-67cf24d29e85","type":"datasets","attributes":{"author":{"id":"de473b30-eb9f-11e9-a77a-c7405862b8bd"},"created_at":"2025-08-06T20:51:41.035546Z","dataset_id":"2c079a59-06d7-4592-adcc-8f4124c0ea3b","expected_output":{"answer":"Paris"},"input":"A","metadata":{"difficulty":"easy"},"updated_at":"2025-08-06T20:51:41.035546Z"}}],"meta":{"after":""}}'
25+
headers:
26+
content-length:
27+
- '384'
28+
content-security-policy:
29+
- frame-ancestors 'self'; report-uri https://logs.browser-intake-datadoghq.com/api/v2/logs?dd-api-key=pube4f163c23bbf91c16b8f57f56af9fc58&dd-evp-origin=content-security-policy&ddsource=csp-report&ddtags=site%3Adatadoghq.com
30+
content-type:
31+
- application/vnd.api+json
32+
date:
33+
- Wed, 06 Aug 2025 20:51:51 GMT
34+
strict-transport-security:
35+
- max-age=31536000; includeSubDomains; preload
36+
vary:
37+
- Accept-Encoding
38+
x-content-type-options:
39+
- nosniff
40+
x-frame-options:
41+
- SAMEORIGIN
42+
status:
43+
code: 200
44+
message: OK
45+
version: 1
Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,39 @@
11
interactions:
22
- request:
3-
body: '{"data": {"type": "datasets", "id": "6b9d0a65-a8a0-47ef-9deb-6fdbeef395f8",
3+
body: '{"data": {"type": "datasets", "id": "2facf581-08b9-40c8-8377-28f895290c88",
44
"attributes": {"insert_records": [{"input": {"prompt": "What is the capital
5-
of France?"}, "expected_output": {"answer": "Paris"}, "metadata": {}}], "update_records":
5+
of France?"}, "expected_output": {"answer": "Paris"}, "metadata": null}], "update_records":
66
[], "delete_records": []}}}'
77
headers:
88
Accept:
99
- '*/*'
10-
? !!python/object/new:multidict._multidict.istr
10+
? !!python/object/apply:multidict._multidict.istr
1111
- Accept-Encoding
1212
: - identity
1313
Connection:
1414
- keep-alive
1515
Content-Length:
16-
- '269'
17-
? !!python/object/new:multidict._multidict.istr
16+
- '271'
17+
? !!python/object/apply:multidict._multidict.istr
1818
- Content-Type
1919
: - application/json
2020
User-Agent:
2121
- python-requests/2.32.3
2222
method: POST
23-
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/6b9d0a65-a8a0-47ef-9deb-6fdbeef395f8/batch_update
23+
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/2facf581-08b9-40c8-8377-28f895290c88/batch_update
2424
response:
2525
body:
26-
string: '{"data":[{"id":"d2d91cb9-c26a-45ec-84f1-68e6e2abc415","type":"datasets","attributes":{"author":{"id":"0dd9d379-c1a3-11ed-b4e0-566658a732f8"},"created_at":"2025-07-16T20:06:44.561248129Z","dataset_id":"6b9d0a65-a8a0-47ef-9deb-6fdbeef395f8","expected_output":{"answer":"Paris"},"input":{"prompt":"What
27-
is the capital of France?"},"metadata":{},"updated_at":"2025-07-16T20:06:44.561248129Z","version":1}}]}'
26+
string: '{"data":[{"id":"6eea94f8-d0d9-439b-813c-d6601c683828","type":"datasets","attributes":{"author":{"id":"de473b30-eb9f-11e9-a77a-c7405862b8bd"},"created_at":"2025-08-06T20:51:29.335441534Z","dataset_id":"2facf581-08b9-40c8-8377-28f895290c88","expected_output":{"answer":"Paris"},"input":{"prompt":"What
27+
is the capital of France?"},"updated_at":"2025-08-06T20:51:29.335441534Z","version":1}}]}'
2828
headers:
2929
content-length:
30-
- '403'
30+
- '389'
3131
content-security-policy:
3232
- frame-ancestors 'self'; report-uri https://logs.browser-intake-datadoghq.com/api/v2/logs?dd-api-key=pube4f163c23bbf91c16b8f57f56af9fc58&dd-evp-origin=content-security-policy&ddsource=csp-report&ddtags=site%3Adatadoghq.com
3333
content-type:
3434
- application/vnd.api+json
3535
date:
36-
- Wed, 16 Jul 2025 20:06:44 GMT
36+
- Wed, 06 Aug 2025 20:51:29 GMT
3737
strict-transport-security:
3838
- max-age=31536000; includeSubDomains; preload
3939
vary:
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
11
interactions:
22
- request:
3-
body: '{"data": {"type": "datasets", "id": "ce11ec07-4c99-49c8-a471-acddc38e8946",
3+
body: '{"data": {"type": "datasets", "id": "3249d56a-701e-45e1-a317-790c8e0eff66",
44
"attributes": {"insert_records": [], "update_records": [], "delete_records":
5-
["1dd96d4f-e2ba-4e20-b36f-9fd03131d655"]}}}'
5+
["4b02d88b-1458-4b88-8322-3ec086113dd0"]}}}'
66
headers:
77
Accept:
88
- '*/*'
9-
? !!python/object/new:multidict._multidict.istr
9+
? !!python/object/apply:multidict._multidict.istr
1010
- Accept-Encoding
1111
: - identity
1212
Connection:
1313
- keep-alive
1414
Content-Length:
1515
- '196'
16-
? !!python/object/new:multidict._multidict.istr
16+
? !!python/object/apply:multidict._multidict.istr
1717
- Content-Type
1818
: - application/json
1919
User-Agent:
2020
- python-requests/2.32.3
2121
method: POST
22-
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/ce11ec07-4c99-49c8-a471-acddc38e8946/batch_update
22+
uri: https://api.datadoghq.com/api/unstable/llm-obs/v1/datasets/3249d56a-701e-45e1-a317-790c8e0eff66/batch_update
2323
response:
2424
body:
2525
string: '{"data":[]}'
@@ -31,7 +31,7 @@ interactions:
3131
content-type:
3232
- application/vnd.api+json
3333
date:
34-
- Thu, 17 Jul 2025 05:00:20 GMT
34+
- Wed, 06 Aug 2025 20:52:13 GMT
3535
strict-transport-security:
3636
- max-age=31536000; includeSubDomains; preload
3737
vary:

0 commit comments

Comments
 (0)