Skip to content

Commit 9226f60

Browse files
committed
Add experimentation results
0 parents  commit 9226f60

22 files changed

+710
-0
lines changed

.aws-sam/build.toml

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# This file is auto generated by SAM CLI build command
2+
3+
[function_build_definitions.f12170d0-6ad3-4216-8049-89f68ec73eaa]
4+
packagetype = "Image"
5+
functions = ["QwenFunction"]
6+
7+
[function_build_definitions.f12170d0-6ad3-4216-8049-89f68ec73eaa.metadata]
8+
Dockerfile = "Dockerfile"
9+
DockerContext = "/Users/limkuangtar/Code/qwen-in-a-lambda/qwen_function"
10+
DockerTag = "python3.11-v1"
11+
12+
[layer_build_definitions]

.gitignore

+246
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
2+
# Created by https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode
3+
4+
### Linux ###
5+
*~
6+
7+
# temporary files which can be created if a process still has a handle open of a deleted file
8+
.fuse_hidden*
9+
10+
# KDE directory preferences
11+
.directory
12+
13+
# Linux trash folder which might appear on any partition or disk
14+
.Trash-*
15+
16+
# .nfs files are created when an open file is removed but is still being accessed
17+
.nfs*
18+
19+
### OSX ###
20+
*.DS_Store
21+
.AppleDouble
22+
.LSOverride
23+
24+
# Icon must end with two \r
25+
Icon
26+
27+
# Thumbnails
28+
._*
29+
30+
# Files that might appear in the root of a volume
31+
.DocumentRevisions-V100
32+
.fseventsd
33+
.Spotlight-V100
34+
.TemporaryItems
35+
.Trashes
36+
.VolumeIcon.icns
37+
.com.apple.timemachine.donotpresent
38+
39+
# Directories potentially created on remote AFP share
40+
.AppleDB
41+
.AppleDesktop
42+
Network Trash Folder
43+
Temporary Items
44+
.apdisk
45+
46+
### PyCharm ###
47+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
48+
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
49+
50+
# User-specific stuff:
51+
.idea/**/workspace.xml
52+
.idea/**/tasks.xml
53+
.idea/dictionaries
54+
55+
# Sensitive or high-churn files:
56+
.idea/**/dataSources/
57+
.idea/**/dataSources.ids
58+
.idea/**/dataSources.xml
59+
.idea/**/dataSources.local.xml
60+
.idea/**/sqlDataSources.xml
61+
.idea/**/dynamic.xml
62+
.idea/**/uiDesigner.xml
63+
64+
# Gradle:
65+
.idea/**/gradle.xml
66+
.idea/**/libraries
67+
68+
# CMake
69+
cmake-build-debug/
70+
71+
# Mongo Explorer plugin:
72+
.idea/**/mongoSettings.xml
73+
74+
## File-based project format:
75+
*.iws
76+
77+
## Plugin-specific files:
78+
79+
# IntelliJ
80+
/out/
81+
82+
# mpeltonen/sbt-idea plugin
83+
.idea_modules/
84+
85+
# JIRA plugin
86+
atlassian-ide-plugin.xml
87+
88+
# Cursive Clojure plugin
89+
.idea/replstate.xml
90+
91+
# Ruby plugin and RubyMine
92+
/.rakeTasks
93+
94+
# Crashlytics plugin (for Android Studio and IntelliJ)
95+
com_crashlytics_export_strings.xml
96+
crashlytics.properties
97+
crashlytics-build.properties
98+
fabric.properties
99+
100+
### PyCharm Patch ###
101+
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
102+
103+
# *.iml
104+
# modules.xml
105+
# .idea/misc.xml
106+
# *.ipr
107+
108+
# Sonarlint plugin
109+
.idea/sonarlint
110+
111+
### Python ###
112+
# Byte-compiled / optimized / DLL files
113+
__pycache__/
114+
*.py[cod]
115+
*$py.class
116+
117+
# C extensions
118+
*.so
119+
120+
# Distribution / packaging
121+
.Python
122+
build/
123+
develop-eggs/
124+
dist/
125+
downloads/
126+
eggs/
127+
.eggs/
128+
lib/
129+
lib64/
130+
parts/
131+
sdist/
132+
var/
133+
wheels/
134+
*.egg-info/
135+
.installed.cfg
136+
*.egg
137+
138+
# PyInstaller
139+
# Usually these files are written by a python script from a template
140+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
141+
*.manifest
142+
*.spec
143+
144+
# Installer logs
145+
pip-log.txt
146+
pip-delete-this-directory.txt
147+
148+
# Unit test / coverage reports
149+
htmlcov/
150+
.tox/
151+
.coverage
152+
.coverage.*
153+
.cache
154+
.pytest_cache/
155+
nosetests.xml
156+
coverage.xml
157+
*.cover
158+
.hypothesis/
159+
160+
# Translations
161+
*.mo
162+
*.pot
163+
164+
# Flask stuff:
165+
instance/
166+
.webassets-cache
167+
168+
# Scrapy stuff:
169+
.scrapy
170+
171+
# Sphinx documentation
172+
docs/_build/
173+
174+
# PyBuilder
175+
target/
176+
177+
# Jupyter Notebook
178+
.ipynb_checkpoints
179+
180+
# pyenv
181+
.python-version
182+
183+
# celery beat schedule file
184+
celerybeat-schedule.*
185+
186+
# SageMath parsed files
187+
*.sage.py
188+
189+
# Environments
190+
.env
191+
.venv
192+
env/
193+
venv/
194+
ENV/
195+
env.bak/
196+
venv.bak/
197+
198+
# Spyder project settings
199+
.spyderproject
200+
.spyproject
201+
202+
# Rope project settings
203+
.ropeproject
204+
205+
# mkdocs documentation
206+
/site
207+
208+
# mypy
209+
.mypy_cache/
210+
211+
### VisualStudioCode ###
212+
.vscode/*
213+
!.vscode/settings.json
214+
!.vscode/tasks.json
215+
!.vscode/launch.json
216+
!.vscode/extensions.json
217+
.history
218+
219+
### Windows ###
220+
# Windows thumbnail cache files
221+
Thumbs.db
222+
ehthumbs.db
223+
ehthumbs_vista.db
224+
225+
# Folder config file
226+
Desktop.ini
227+
228+
# Recycle Bin used on file shares
229+
$RECYCLE.BIN/
230+
231+
# Windows Installer files
232+
*.cab
233+
*.msi
234+
*.msm
235+
*.msp
236+
237+
# Windows shortcuts
238+
*.lnk
239+
240+
# Build folder
241+
242+
*/build/*
243+
244+
# End of https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode
245+
246+
*.gguf

README.md

+91
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Qwen in a Lambda
2+
3+
Updated at 11/09/2024
4+
5+
(Marking the date because of how fast LLM APIs in Python move and may introduce breaking changes by the time anyone else reads this!)
6+
7+
## Intro:
8+
9+
- This is a minor research on how we can put Qwen GGUF model files into AWS Lambda using Docker and SAM CLI
10+
11+
- Adapted from https://makit.net/blog/llm-in-a-lambda-function/
12+
- As of September '24, some required OS packages are not included in the above guide and subsequently in the Dockerfile as potentially the llama-cpp-python does not include the required OS packages (?)
13+
- Who knows if there's anything new and breaking that will appear in the future :shrugs:
14+
15+
## Motivation:
16+
17+
- I wanted to find out if I can reduce my AWS spending by only leveraging on the capabilities of Lambda and not Lambda + Bedrock as both services would incur more costs in the long run.
18+
19+
- The idea was to fit a small language model which wouldn't be as resource intensive relatively speaking and to, hopefully, receive subsecond to second latency on a 128 - 256 mb memory configuration
20+
21+
- I wanted to use also GGUF models to use different levels of quantization to find out which is the best performance / file size to be loaded into memory
22+
- My experimentation lead to me using Qwen2 1.5b Q5_K_M as it had the best "performance" and "latency" locally to receive prompt and spit out JSON structure using llama-cpp
23+
24+
## Prerequisites:
25+
26+
- Docker
27+
- AWS SAM CLI
28+
- AWS CLI
29+
- Python 3.11
30+
- ECR permissions
31+
- Lambda permissions
32+
- Download `qwen2-1_5b-instruct-q5_k_m.gguf` into `qwen_fuction/function/`
33+
- Or download any other .gguf models that you'd like and change your model path in `app.y / LOCAL_PATH`
34+
35+
## Setup Guide:
36+
37+
- Install pip packages under `qwen_function/function/requirements.txt` (preferably in a venv/conda env)
38+
- Run `sam build` / `sam validate`
39+
- Run `sam local start-api` to test locally
40+
- Run `curl --header "Content-Type: application/json" \
41+
--request POST \
42+
--data '{"prompt":"hello"}' \
43+
http://localhost:3000/generate` to prompt the LLM
44+
- Or use your preferred API clients
45+
- Run `sam deploy --guided` to deploy to AWS
46+
47+
## Metrics
48+
49+
- Localhost - Macbook M3 Pro 32 GB
50+
51+
![alt text](/images/image.png)
52+
53+
- AWS
54+
55+
- Initial config - 128mb, 30s timeout
56+
- Lambda timed out! Cold start was timing out the lambda
57+
- Adjusted config #1 - 512mb, 30s timeout
58+
59+
- Lambda timed out! Cold start was timing out the lambda
60+
61+
- Adjusted config #2 - 512mb, 30s timeout
62+
- Lambda timed out! Cold start was timing out the lambda
63+
64+
![alt text](/images/image-1.png)
65+
66+
- Adjusted config #3 - 3008mb, 30s timeout - cold start
67+
68+
![alt text](/images/image-2.png)
69+
70+
- Adjusted config #3 - 3008mb, 30s timeout - warm start
71+
72+
![alt text](/images/image-3.png)
73+
74+
## Observation
75+
76+
- Referring back to the pricing structure of Lambda,
77+
78+
- [Pricing](<https://docs.aws.amazon.com/lambda/latest/operatorguide/computing-power.html#:~:text=Since%20the%20Lambda%20service%20charges,and%20duration%20(in%20seconds)>)
79+
- 1536 MB / 1.465 s / $0.024638 over 1000 Lambda invocations
80+
- Qwen2 1.5b had me cranking up the memory to 3008mb just to not time out and receive 4 - 11 seconds latency response!
81+
- Claude 3 Haiku / $0.00025 / $0.00125 over 1000 input tokens & 1000 tokens / Asia - Tokyo
82+
83+
- It may be cheaper to just use a hosted LLM using AWS Bedrock, etc.. on the cloud as the pricing structure for Lambda w/ Qwen does not look more competitive compared to Claude 3 Haiku
84+
85+
- Results via local is dependant on your machine specs!! and may heavily skew your perception, expectation vs reality
86+
87+
- Depending on your use case also, the latency per lambda invocation and responses might incur poor user experiences
88+
89+
### Conclusion
90+
91+
All in all, I think this was a fun little experiment even though it didn't quite pan out to the budget & latency requirement via Qwen 1.5b for my side project. Thanks to @makit again for the guide!

__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)