Implement capability to compare response against different LLM #12

edwardmfho · 2023-06-07T11:33:22Z

Currently promptimize only evaluate results as pre-defined by developers. We could potentially leverage the current tool, to compare response against different LLMs. For example, we could use GPT-4 as a benchmark, to evaluate the responses from a custom model, on whether they produce similar results.

Add in optional parameter for defining a "target" LLMs.
Add in function to compare the "similiarity" of the results
Allow user to still utilise manual test cases to compare the two LLMs being evaluated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement capability to compare response against different LLM #12

Implement capability to compare response against different LLM #12

edwardmfho commented Jun 7, 2023

Implement capability to compare response against different LLM #12

Implement capability to compare response against different LLM #12

Comments

edwardmfho commented Jun 7, 2023