BuddyLim
diff --git a/‎.aws-sam/build.toml
+12 b/‎.aws-sam/build.toml
+12
diff --git a/‎.gitignore
+246 b/‎.gitignore
+246
diff --git a/‎README.md
+91 b/‎README.md
+91
diff --git a/‎__init__.py b/‎__init__.py
@@ -0,0 +1,12 @@
+# This file is auto generated by SAM CLI build command
+
+[function_build_definitions.f12170d0-6ad3-4216-8049-89f68ec73eaa]
+packagetype = "Image"
+functions = ["QwenFunction"]
+
+[function_build_definitions.f12170d0-6ad3-4216-8049-89f68ec73eaa.metadata]
+Dockerfile = "Dockerfile"
+DockerContext = "/Users/limkuangtar/Code/qwen-in-a-lambda/qwen_function"
+DockerTag = "python3.11-v1"
+
+[layer_build_definitions]
@@ -0,0 +1,246 @@
+
+# Created by https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode
+
+### Linux ###
+*~
+
+# temporary files which can be created if a process still has a handle open of a deleted file
+.fuse_hidden*
+
+# KDE directory preferences
+.directory
+
+# Linux trash folder which might appear on any partition or disk
+.Trash-*
+
+# .nfs files are created when an open file is removed but is still being accessed
+.nfs*
+
+### OSX ###
+*.DS_Store
+.AppleDouble
+.LSOverride
+
+# Icon must end with two \r
+Icon
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+
+### PyCharm ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+
+# User-specific stuff:
+.idea/**/workspace.xml
+.idea/**/tasks.xml
+.idea/dictionaries
+
+# Sensitive or high-churn files:
+.idea/**/dataSources/
+.idea/**/dataSources.ids
+.idea/**/dataSources.xml
+.idea/**/dataSources.local.xml
+.idea/**/sqlDataSources.xml
+.idea/**/dynamic.xml
+.idea/**/uiDesigner.xml
+
+# Gradle:
+.idea/**/gradle.xml
+.idea/**/libraries
+
+# CMake
+cmake-build-debug/
+
+# Mongo Explorer plugin:
+.idea/**/mongoSettings.xml
+
+## File-based project format:
+*.iws
+
+## Plugin-specific files:
+
+# IntelliJ
+/out/
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Cursive Clojure plugin
+.idea/replstate.xml
+
+# Ruby plugin and RubyMine
+/.rakeTasks
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+fabric.properties
+
+### PyCharm Patch ###
+# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
+
+# *.iml
+# modules.xml
+# .idea/misc.xml
+# *.ipr
+
+# Sonarlint plugin
+.idea/sonarlint
+
+### Python ###
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+.pytest_cache/
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule.*
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+### VisualStudioCode ###
+.vscode/*
+!.vscode/settings.json
+!.vscode/tasks.json
+!.vscode/launch.json
+!.vscode/extensions.json
+.history
+
+### Windows ###
+# Windows thumbnail cache files
+Thumbs.db
+ehthumbs.db
+ehthumbs_vista.db
+
+# Folder config file
+Desktop.ini
+
+# Recycle Bin used on file shares
+$RECYCLE.BIN/
+
+# Windows Installer files
+*.cab
+*.msi
+*.msm
+*.msp
+
+# Windows shortcuts
+*.lnk
+
+# Build folder
+
+*/build/*
+
+# End of https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode
+
+*.gguf
@@ -0,0 +1,91 @@
+# Qwen in a Lambda
+
+Updated at 11/09/2024
+
+(Marking the date because of how fast LLM APIs in Python move and may introduce breaking changes by the time anyone else reads this!)
+
+## Intro:
+
+- This is a minor research on how we can put Qwen GGUF model files into AWS Lambda using Docker and SAM CLI
+
+- Adapted from https://makit.net/blog/llm-in-a-lambda-function/
+  - As of September '24, some required OS packages are not included in the above guide and subsequently in the Dockerfile as potentially the llama-cpp-python does not include the required OS packages (?)
+  - Who knows if there's anything new and breaking that will appear in the future :shrugs:
+
+## Motivation:
+
+- I wanted to find out if I can reduce my AWS spending by only leveraging on the capabilities of Lambda and not Lambda + Bedrock as both services would incur more costs in the long run.
+
+- The idea was to fit a small language model which wouldn't be as resource intensive relatively speaking and to, hopefully, receive subsecond to second latency on a 128 - 256 mb memory configuration
+
+- I wanted to use also GGUF models to use different levels of quantization to find out which is the best performance / file size to be loaded into memory
+  - My experimentation lead to me using Qwen2 1.5b Q5_K_M as it had the best "performance" and "latency" locally to receive prompt and spit out JSON structure using llama-cpp
+
+## Prerequisites:
+
+- Docker
+- AWS SAM CLI
+- AWS CLI
+- Python 3.11
+- ECR permissions
+- Lambda permissions
+- Download `qwen2-1_5b-instruct-q5_k_m.gguf` into `qwen_fuction/function/`
+  - Or download any other .gguf models that you'd like and change your model path in `app.y / LOCAL_PATH`
+
+## Setup Guide:
+
+- Install pip packages under `qwen_function/function/requirements.txt` (preferably in a venv/conda env)
+- Run `sam build` / `sam validate`
+- Run `sam local start-api` to test locally
+- Run `curl --header "Content-Type: application/json" \
+--request POST \
+--data '{"prompt":"hello"}' \
+http://localhost:3000/generate` to prompt the LLM
+  - Or use your preferred API clients
+- Run `sam deploy --guided` to deploy to AWS
+
+## Metrics
+
+- Localhost - Macbook M3 Pro 32 GB
+
+![alt text](/images/image.png)
+
+- AWS
+
+  - Initial config - 128mb, 30s timeout
+    - Lambda timed out! Cold start was timing out the lambda
+  - Adjusted config #1 - 512mb, 30s timeout
+
+    - Lambda timed out! Cold start was timing out the lambda
+
+  - Adjusted config #2 - 512mb, 30s timeout
+    - Lambda timed out! Cold start was timing out the lambda
+
+![alt text](/images/image-1.png)
+
+- Adjusted config #3 - 3008mb, 30s timeout - cold start
+
+![alt text](/images/image-2.png)
+
+- Adjusted config #3 - 3008mb, 30s timeout - warm start
+
+![alt text](/images/image-3.png)
+
+## Observation
+
+- Referring back to the pricing structure of Lambda,
+
+  - [Pricing](<https://docs.aws.amazon.com/lambda/latest/operatorguide/computing-power.html#:~:text=Since%20the%20Lambda%20service%20charges,and%20duration%20(in%20seconds)>)
+  - 1536 MB / 1.465 s / $0.024638 over 1000 Lambda invocations
+    - Qwen2 1.5b had me cranking up the memory to 3008mb just to not time out and receive 4 - 11 seconds latency response!
+  - Claude 3 Haiku / $0.00025 / $0.00125 over 1000 input tokens & 1000 tokens / Asia - Tokyo
+
+- It may be cheaper to just use a hosted LLM using AWS Bedrock, etc.. on the cloud as the pricing structure for Lambda w/ Qwen does not look more competitive compared to Claude 3 Haiku
+
+- Results via local is dependant on your machine specs!! and may heavily skew your perception, expectation vs reality
+
+- Depending on your use case also, the latency per lambda invocation and responses might incur poor user experiences
+
+### Conclusion
+
+All in all, I think this was a fun little experiment even though it didn't quite pan out to the budget & latency requirement via Qwen 1.5b for my side project. Thanks to @makit again for the guide!