Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor/inconsistent results from Phi-3-mini-128k #2055

Closed
2 of 4 tasks
amihalik opened this issue Jun 11, 2024 · 1 comment · Fixed by #2060
Closed
2 of 4 tasks

Poor/inconsistent results from Phi-3-mini-128k #2055

amihalik opened this issue Jun 11, 2024 · 1 comment · Fixed by #2060

Comments

@amihalik
Copy link
Contributor

amihalik commented Jun 11, 2024

Phi-3-mini-128k runs, but the result that I'm seeing from TGI are not very good and inconsistent with other inference methods.

For each result below, I supply the same pairwise compare prompt:

<|user|>
Consider this information about Johnny: 

Johnny likes apples.  Johnny hates peanuts.

Now consider these options:

Option A: Pecan Pie
Option B: Peach Pie

Consider the information and brainstorm on items Johnny might like.  Be thorough and consider all possibilities.

Next, use the brainstorm to pick which option Johnny would prefer.

End your response with the option you choose (Either **Option A** or **Option B**).<|end|>
<|assistant|>

Running on "Other methods"

I get great results from NIM https://build.nvidia.com/microsoft/phi-3-mini:

Given the information, we know Johnny likes apples and hates peanuts. Let's brainstorm on items Johnny might like based on this information:

Apple-based desserts: Since Johnny likes apples, he might enjoy desserts that contain apples. This could include apple pies, apple crumble, apple tarts, apple crisp, apple cake, etc.

Non-peanut based desserts: Since Johnny hates peanuts, he would likely not enjoy any desserts that contain peanuts. This eliminates any desserts that have peanuts as an ingredient, such as peanut butter pie, peanut brittle, peanut cookies, etc.

Now, let's consider the two options:

Option A: Pecan Pie - Pecan pie is a dessert that contains pecans, which are a type of nut. Although Johnny doesn't have a specific dislike for all nuts, he does hate peanuts. It's possible that he might not like pecans as well, but we don't have enough information to be certain.

Option B: Peach Pie - Peach pie is a dessert that contains peaches, which are a type of fruit. Since Johnny likes apples, it's reasonable to assume that he might also enjoy other fruits, such as peaches.

Based on the information provided and the brainstorming, Option B: Peach Pie seems to be the more likely choice for Johnny, as it aligns more closely with his preference for fruit (apples) and doesn't contain any peanuts.

I get great results from Transformers (running python on the TGI:2.0.4 container):

Given Johnny's preferences, we know he likes apples and hates peanuts. Let's brainstorm on items Johnny might like based on this information:

1. Apple-based desserts or dishes: Since Johnny likes apples, he might enjoy any dish or dessert that features apples as a primary ingredient. This could include apple pies, apple crisps, apple tarts, apple cakes, apple sauce, and more.

2. Non-peanut-based desserts or dishes: Johnny hates peanuts, so he would likely not enjoy any dish or dessert that contains peanuts or peanut-based ingredients. This eliminates any desserts or dishes that include peanuts, such as peanut butter cookies, peanut butter pies, or peanut brittle.

Now, let's consider the two options provided:

Option A: Pecan Pie
Pecan pie is a dessert that contains pecans, which are a type of nut. Although Johnny doesn't have a stated preference for or against nuts in general, we know he hates peanuts. Since pecans are different from peanuts, it's possible that Johnny might not have an issue with pecan pie. However, without more information about Johnny's feelings towards other nuts, we can't definitively say he would prefer this option.

Option B: Peach Pie
Peach pie is a dessert that features peaches as the primary fruit ingredient. Given that Johnny likes apples, it's reasonable to assume that he might also enjoy other fruits, such as peaches. Since there are no peanuts or peanut-based ingredients in peach pie, this option aligns well with Johnny's preferences.

Based on the information provided and the brainstorming process, **Option B: Peach Pie** seems to be the more likely choice for Johnny, as it aligns with his preference for fruits (apples and potentially peaches) and his dislike for peanuts.

Running on TGI 2.0.4

I get okay-ish results when I run TGI:2.0.4 with the default parameters:

Given Johnny's preferences, we know he likes apples and dislikes peanuts. Neither pecan nor peach pies directly relate to apples, but we can infer preferences based on the given information.

- **Option A: Pecan Pie** - Pecans are not directly related to apples, but pecans are a type of nut, and Johnny hates peanuts, which are also nuts. This might make him less likely to prefer a pecan pie due to his aversion to nuts.

- **Option B: Peach Pie** - Peaches are fruits, and there's no direct relation to apples in the information provided. However, since there's no negative association mentioned, it's a safer guess that Johnny might not have an issue with peaches.

Given the information, **Option B: Peach Pie** seems to be the safer choice, as it doesn't directly conflict with any of Johnny's stated preferences.

I get poor (but fluent) results when I use increase the token sizes to 5k (input) 7k (total) (but using the same prompt):

Given the information provided, Johnny likes apples and dislikes peanuts. Neither pecan nor peach is directly related to apples or peanuts. However, if we consider the options as food items, a peach pie could be more closely related to apples as both are fruits. Therefore, Johnny might have a slight preference for the peach pie. But this is a very weak preference and not a strong preference. The information given does not strongly indicate a clear preference for either option. However, if we must choose one, Option B: Peach Pie could be a slightly more likely choice.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Note: I'm running these commands on an AWS g6.2xlarge instance

Reproducing the "Transformers" results

  1. Run python on a TGI container
docker run -it --rm --name tgi --gpus all --shm-size 2g  \
    --entrypoint python \
    ghcr.io/huggingface/text-generation-inference:2.0.4 
  1. Run this python code:
prompt = """<|user|>
Consider this information about Johnny: 

Johnny likes apples.  Johnny hates peanuts.

Now consider these options:

Option A: Pecan Pie
Option B: Peach Pie

Consider the information and brainstorm on items Johnny might like.  Be thorough and consider all possibilities.

Next, use the brainstorm to pick which option Johnny would prefer.

End your response with the option you choose (Either **Option A** or **Option B**).<|end|>
<|assistant|>"""

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "microsoft/Phi-3-mini-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="cuda",
    torch_dtype="auto", 
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "do_sample": False,
}

output = pipe(prompt, **generation_args)
print(output[0]['generated_text'])

Reproducing the "TGI" results

  1. Run TGI
    Either:
docker run -d --rm --name tgi -p 8080:80 --gpus all --shm-size 2g \
    ghcr.io/huggingface/text-generation-inference:2.0.4 \
    --model-id microsoft/Phi-3-mini-128k-instruct 

or

docker run -d --rm --name tgi -p 8080:80 --gpus all --shm-size 2g \
    ghcr.io/huggingface/text-generation-inference:2.0.4 \
    --model-id microsoft/Phi-3-mini-128k-instruct \
    --max-batch-prefill-tokens=5000 --max-total-tokens=7000 --max-input-tokens=5000
  1. Query the endpoint and print the results
curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{
        "inputs": "Consider this information about Johnny:\n\nJohnny likes apples. Johnny hates peanuts.\n\nNow consider these options:\n\nOption A: Pecan Pie\nOption B: Peach Pie\n\nConsider the information and brainstorm on items Johnny might like. Be thorough and consider all possibilities.\n\nNext, use the brainstorm to pick which option Johnny would prefer.\n\nEnd your response with the option you choose (Either **Option A** or **Option B**).",
        "parameters": {
            "max_new_tokens": 1000,
            "do_sample": false
        }
    }' \
    -H 'Content-Type: application/json' | jq -r '.generated_text'

Expected behavior

I expect TGI to produce results consistent with other inference servers and Transformers

@amihalik
Copy link
Contributor Author

amihalik commented Jun 11, 2024

Note: The "Johnny" pairwise prompt is a stand-in for the actual prompts I'm using. The results for my actual prompts are even more disappointing. I simply get "Option A" or "Option B" from TGI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant