Description
Air Gapped and Offline local GPU usage of LLM models Meta Llama 3.2, claude-sonnet 3.5, Gemma 2-27b, Qwen2, Mistral-large-instruct, Deepseek R1 / V3
Compare to OpenAI o1 pro
-
deepseek R1 - DeepSeek R1 14b on NVIDIA 48G RTX-A6000, Apple M2 Ultra 60 core 64G or Apple M4 Max 40 core 48G compared to OpenAI o1 pro machine-learning#37 - see deepseek-r1 on ollama on dual RTX-4090 dual RTX-A4500 RTX-A6000 RTX-A4000 RTX-A3500 M4 Max 40 core #95
-
Qwen2 - Qwen2 72b-instruct on NVIDIA 48G RTX-A6000 or Apple M4 Max 40 core 48G compared to OpenAI o1 pro #96
-
Llama 3.3 - Meta llama 3.3 70b on NVIDIA 48G RTX-A6000 or Apple M4 Max 40 core 48G compared to OpenAI o1 pro #97
-
Mistral-large-instruct Mistral-large123b-instruct-2411-q2_K (45G) on NVIDIA 48G RTX-A6000 or Apple M4 Max 40 core 48G compared to OpenAI o1 pro #98
-
claude-sonnet 3.5 Claude-Sonnet 3.5 #99
-
gemma 2 - Google Gemma 2 27b on CUDA and Metal #31
-
GPU/CPU performance from the bottom up - https://github.com/ObrienlabsDev/performance