-
Notifications
You must be signed in to change notification settings - Fork 528
Android JNI llama cache temperature in class #10287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10287
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Unrelated FailureAs of commit e8b0a67 with merge base 96c10bb ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces caching for the temperature parameter in the Android JNI layer for llama models. The changes include adding a new member variable to store temperature, initializing that variable in the constructor, and updating the generation configuration to use the cached value.
- Added a new member variable (temperature_) to cache the temperature.
- Assigned the temperature parameter to temperature_ in the constructor.
- Updated the generation configuration to use temperature_.
Comments suppressed due to low confidence (1)
extension/android/jni/jni_layer_llama.cpp:186
- Please verify that removing the temperature parameter from the MTKLlamaRunner constructor is intentional. If the temperature was previously required by the runner, additional changes in MTKLlamaRunner or its usage may be needed.
tokenizer_path->toStdString().c_str());
18f75b1
to
e26af08
Compare
9f6ee3b
to
6dc4fc0
Compare
6dc4fc0
to
c11f8dc
Compare
@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
So far, for LLM, we can use the new config, with cached temperature from ctor. For llava, we use the old workflow.
We will update the API to move temperature from ctor to generate() next.
Test: instrumentation test