Skip to content

VPC Chat completions API #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 20, 2025
Merged

VPC Chat completions API #74

merged 7 commits into from
Jun 20, 2025

Conversation

huiwengoh
Copy link
Collaborator

@huiwengoh huiwengoh commented Jun 11, 2025

Note: very draft PR, mostly skeleton structure

Sample usage for score:

import openai
from cleanlab_tlm.utils.chat_completions import TLMChatCompletion

openai_kwargs = {
    "model":"gpt-4o-mini",
    "messages":[{ "content": "What is 1+1?","role": "user"}]
}
openai_completion = openai.chat.completions.create(
  **openai_kwargs
)

openai_completion
>> ChatCompletion(id='chatcmpl-Bh5HglTUAg5pxDy0299VjXTzx14mA', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1 + 1 equals 2.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1749608116, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_34a54ae93c', usage=CompletionUsage(completion_tokens=8, prompt_tokens=14, total_tokens=22, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

tlm = TLMChatCompletion()
score_response = tlm.score(
    completion=openai_completion,
    **openai_kwargs
)

score_response
>> {'trustworthiness_score': 0.9819999106445161}

@huiwengoh huiwengoh marked this pull request as draft June 11, 2025 02:18
@huiwengoh huiwengoh requested a review from jwmueller June 11, 2025 02:32
Comment on lines 55 to 67
res = requests.post(
f"{BASE_URL}/score",
json={
"tlm_options": self._tlm_options,
"completion": completion.model_dump(),
**openai_kwargs,
},
timeout=self._timeout,
)

res_json = res.json()

return {"trustworthiness_score": res_json["tlm_metadata"]["score"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't our public implementation of score() be something like this instead?

response_text = completion.get_string_response()
messages = get_messages_list(openai_kwargs)
tlm_prompt = form_prompt_string(messages)
return tlm.get_trustworthiness_score(tlm_prompt, response_text)

or is there a specific reason to use a special backend API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with this ^ we'd ideally want to try and extract logprobs from the response if they are available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend API would allow for users to do scoring in VPC - otherwise tim.get_trustworthiness_score() would be calling our SaaS TLM right?
Also this would allow better scoring for stuff like tool calls / structure outputs /other formats in the future I think if we retain the full completions object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me. If it helps you code better/faster, I think it's fine to have two different modules: One for SaaS, one for VPC

@jwmueller
Copy link
Member

jwmueller commented Jun 11, 2025

I vote our tutorial for this looks something like this:

Easiest way to use TLM if you're using the Chat Completions API

  1. Here's how to score the response from every LLM call you've made:
from cleanlab_tlm.utils.chat_completions import TLMChatCompletion
from openai import OpenAI

client = OpenAI()
tlm = TLMChatCompletion(options={'log':['explanation']})  # See Advanced Tutorial for optional configurations for better/faster results

# Your existing code:

openai_kwargs = dict(
    model="gpt-4.1",
     messages=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
     ]
)

response = client.chat.completions.create(**openai_kwargs)

# Code to score:

score = tlm.score(
    response=response,
    **openai_kwargs
)
  1. Here's a convenient wrapper you can use to have 1 line of code that generates the response and scores its trustworthiness:
class TrustworthyOpenAIClient:
    # has attributes: openai_client = user's own OpenAI Client (not Cleanlab's), tlm = TLMChatCompletion 

    def create(self, openai_client, **openai_kwargs):
        response = self.openai_client.chat.completions.create(**openai_kwargs)
        score = self.tlm.score(response=response,**openai_kwargs)
        response["tlm_metadata"] = score
        return response


trustworthy_openai = TrustworthyOpenAIClient(client, tlm)
response = trustworthy_openai.completions.create(**openai_kwargs)
print(response.text)
print(response['tlm_metadata'].score)
  1. Finally, Cleanlab already provides such a wrapper that generates the response for you using Cleanlab's infrastructure (using the same LLM model that you specify) and simultaneously scores their trustworthiness:

UPDATE: IGNORE THIS, We will use openAI client package in this section and just point its URL at Cleanlab backend instead.

response = tlm.create(
    model="gpt-4o-mini",
    messages=[{"content": "What is 1+1?","role": "user"}],
)

print(response.text)
print(response['tlm_metadata'].score)

Notes from me

Actually I think the optimal way to offer workflow 2 above is a bit different, but my python skills / free time are lacking. I think the optimal way is:

User decorates their call to: client.chat.completions.create() with a decorator that then appends the trust score as a key in this dict. The functionality would be equivalent to what is shown for workflow 2, but the developer experience would be cleaner (zero change of existing code, just add a decorator). Challenge is the decorator has to be associated with an already-instantiated TLMChatCompletion object. Ie something like:


# Before all your code: 

import functools

def trust_score(fn):
    @functools.wraps(fn)
    def wrapper(**kwargs):
        response = fn(**kwargs)
        score = tlm.score(response=response,**kwargs)  # need to somehow get the TLMChatCompletions object into this decorator
        response["tlm_metadata"] = score
        return response
    return wrapper


from openai import OpenAI
client = OpenAI()

client.chat.completions.create = trust_score(client.chat.completions.create)

# After it's been monkey patched, you don't have to change any of your existing code / infra to also get trust scores:

response = client.chat.completions.create(**openai_kwargs)

print(response.text)
print(response['tlm_metadata'].score)


res_json = res.json()

return {"trustworthiness_score": res_json["tlm_metadata"]["trustworthiness_score"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait why is this hardcoded like this? It should return everything that TLM.get_trustworthiness_score() returns.

Maybe I am confused about whether this PR is for VPC-TLM or SaaS-TLM.
If this PR is for VPC-TLM only, then all the code should be in a separate VPC module, so SaaS users are not confused by it.

I'd prefer to only review the SaaS-TLM version, and you can try to make the VPC-TLM API closely match that.

@huiwengoh huiwengoh changed the title [WIP] Chat completions API [WIP] VPC Chat completions API Jun 11, 2025
@@ -0,0 +1,67 @@
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Confirm this VPC version matches our SaaS API as closely as we are easily able to. The analogous SaaS API is defined here:

#75

@jas2600
Copy link
Collaborator

jas2600 commented Jun 13, 2025

for the decorator idea, a decorator factory should work

# Before all your code: 
import functools

def trust_score(tlm_instance):
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            response = fn(*args, **kwargs)
            score = tlm_instance.score(response=response, **kwargs)
            response.tlm_metadata = score
            return response
        return wrapper
    return decorator

from openai import OpenAI
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
client.chat.completions.create = trust_score(tlm)(client.chat.completions.create)

# After it's been monkey patched, you don't have to change any of your existing code / infra to also get trust scores:
...

@huiwengoh huiwengoh requested a review from jwmueller June 17, 2025 23:12
Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all that matters is the tutorial runs well, let's not focus on reviewing this

@huiwengoh huiwengoh marked this pull request as ready for review June 19, 2025 21:19
@huiwengoh huiwengoh changed the title [WIP] VPC Chat completions API VPC Chat completions API Jun 20, 2025
@huiwengoh huiwengoh merged commit 4dead70 into main Jun 20, 2025
3 checks passed
@huiwengoh huiwengoh deleted the vpc-api branch June 20, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants