You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently promptimize only evaluate results as pre-defined by developers. We could potentially leverage the current tool, to compare response against different LLMs. For example, we could use GPT-4 as a benchmark, to evaluate the responses from a custom model, on whether they produce similar results.
Add in optional parameter for defining a "target" LLMs.
Add in function to compare the "similiarity" of the results
Allow user to still utilise manual test cases to compare the two LLMs being evaluated.
The text was updated successfully, but these errors were encountered:
Currently
promptimize
only evaluate results as pre-defined by developers. We could potentially leverage the current tool, to compare response against different LLMs. For example, we could use GPT-4 as a benchmark, to evaluate the responses from a custom model, on whether they produce similar results.The text was updated successfully, but these errors were encountered: