Lazy load for weights, KV cache, optimized inference, !!! 8Gb VRAM !!! #158

nalexand · 2025-08-30T02:54:39Z

Added support for 6-8 Gb videocards, to run localy even on laptop (it is slow but working :) ).
Optimized torch version to run on windows (python 3.12.8, torch 2.8.0+cu128)

python -m gpt_oss.generate --backend torch gpt-oss-20b/original/ -p "Hi" -l 10

python -m gpt_oss.chat --backend=torch gpt-oss-20b/original/

nalexand and others added 12 commits August 30, 2025 04:50

Lazy load for weights, KV cache, optimized inference, !!! 8Gb VRAM !!!

05de4a1

Merge branch 'openai:main' into main

c35a07f

Add video

d5d58b9

Merge branch 'main' of https://github.com/nalexand/gpt-oss

6c58806

Update README.md

96bbffb

Update README.md

811aa43

Remove unused code

7169750

Merge branch 'main' of https://github.com/nalexand/gpt-oss

132adb4

Add support !!! 6 Gb VRAM !!! for gpt-oss-20b

1cea070

Update README.md

3962a3f

Add options for control memory usage

54436c3

Merge branch 'main' of https://github.com/nalexand/gpt-oss

f864397

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lazy load for weights, KV cache, optimized inference, !!! 8Gb VRAM !!! #158

Lazy load for weights, KV cache, optimized inference, !!! 8Gb VRAM !!! #158

nalexand commented Aug 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Lazy load for weights, KV cache, optimized inference, !!! 8Gb VRAM !!! #158

Are you sure you want to change the base?

Lazy load for weights, KV cache, optimized inference, !!! 8Gb VRAM !!! #158

Conversation

nalexand commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nalexand commented Aug 30, 2025 •

edited

Loading