Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
2bfc361
fix input_embeddings prefill bug in generate_step
Blaizzy Nov 12, 2025
5cf134c
format
Blaizzy Nov 12, 2025
2e09bd2
Merge branch 'ml-explore:main' into main
Blaizzy Dec 2, 2025
d59907c
Merge branch 'ml-explore:main' into main
Blaizzy Mar 4, 2026
e06ca01
Merge branch 'ml-explore:main' into main
Blaizzy Apr 2, 2026
5c965a5
Merge branch 'ml-explore:main' into main
Blaizzy Apr 24, 2026
18c12cb
add DS4
Blaizzy Apr 24, 2026
28d991f
Fix DeepSeek V4 Flash generation math
Blaizzy Apr 24, 2026
be714fd
Generalize safetensors dtype fallback
Blaizzy Apr 24, 2026
a833f26
Remove deprecated DeepSeek V4 tests for sanitization and model loading
Blaizzy Apr 24, 2026
28a83ea
Refactor DeepSeek V4 tests to consolidate and clarify test cases
Blaizzy Apr 24, 2026
bc1c285
format
Blaizzy Apr 24, 2026
f4f7b4d
keep experts quantized
Blaizzy Apr 24, 2026
e3221c4
fix fp32 promotion (3 tps -> 11 tps)
Blaizzy Apr 24, 2026
16b268a
Fix DeepSeek V4 batched routing
Blaizzy Apr 24, 2026
201267d
compile _limited_swiglu, hc_split_sinkhorn and make robust predicate
Blaizzy Apr 24, 2026
f26e519
Enhance quant_predicate to support new projection modes and add @mx.c…
Blaizzy Apr 24, 2026
61e006e
keep experts in mxfp4
Blaizzy Apr 24, 2026
f2b4e9d
Refactor quantized linear layer
Blaizzy Apr 24, 2026
c0d9222
Implement optimized Sinkhorn operations in DeepSeek V4 and add corres…
Blaizzy Apr 24, 2026
ef8c95d
Optimize matrix multiplication in HyperConnection by replacing einsum…
Blaizzy Apr 24, 2026
f7ff216
HC sinkhorn normalization
Blaizzy Apr 24, 2026
c6a7828
fix(deepseek_v4): numerical stability and compressed attention mask p…
eauchs Apr 24, 2026
f8ebcaf
Merge pull request #14 from eauchs/fix-blaizzy-pr1192
Blaizzy Apr 24, 2026
2a73e76
Refactor DeepSeek V4 scoring and compile rope full
Blaizzy Apr 25, 2026
c2c9801
Add DeepseekV4SwitchGLU and BatchRotatingKVCache; refactor position h…
Blaizzy Apr 25, 2026
8ac37c0
Refactor DeepseekV4Cache to improve state management and add length h…
Blaizzy Apr 25, 2026
369fd37
format
Blaizzy Apr 25, 2026
166bba6
Revert "keep experts in mxfp4"
Blaizzy Apr 25, 2026
cbb0b72
Revert quant predicate (lower precision experts)
Blaizzy Apr 25, 2026
e3313d4
Remove quantization_config from ModelArgs in deepseek_v4.py
Blaizzy Apr 25, 2026
16de966
Update deepseek_v4.py
Blaizzy Apr 25, 2026
01bcd7d
Fix RoPE init
Blaizzy Apr 25, 2026
8bd10c6
Refactor RoPE initialization in V4Attention to streamline handling of…
Blaizzy Apr 25, 2026
014a10b
Improve DeepseekV4Cache and V4Attention with improved trimming logic …
Blaizzy Apr 25, 2026
5d1a67a
Add scores parameter and updatie projection logic
Blaizzy Apr 25, 2026
26f49f5
Optimize DeepSeek V4 performance
0xClandestine Apr 26, 2026
2ba4215
Refactor dtype handling in _ensure_cached method and improve pooled t…
Blaizzy Apr 26, 2026
16f7205
format
Blaizzy Apr 26, 2026
193aa77
Fix matrix multiplication in _hc_expand_op to use original comb tenso…
Blaizzy Apr 26, 2026
5a4aaa4
Refactor output projection
Blaizzy Apr 26, 2026
2591b51
Refactor kv tensor reshaping in V4Attention for improved dimensionali…
Blaizzy Apr 26, 2026
22da01a
Remove redundant cache type check in V4Attention to streamline proces…
Blaizzy Apr 26, 2026
8e8571a
Fix DeepSeek V4 sparse pooled prefill memory
Blaizzy Apr 26, 2026
ddeffe3
Remove sparse pooled attention test block
Blaizzy Apr 26, 2026
acf650c
Fix DeepSeek V4 HyperConnection expand orientation
Blaizzy Apr 27, 2026
41ba612
fix: replace broadcast multiply with matmul in sparse pooled attention
0xClandestine Apr 27, 2026
dd6b92f
Merge pull request #17 from 0xClandestine/fix/ds4-ram-usage
Blaizzy Apr 27, 2026
f83460f
Enable loading the original checkpoint
angeloskath Apr 28, 2026
81a8c57
Simplify GLU and gate remove intermediate castings
angeloskath Apr 28, 2026
4951496
Fix RoPE to use the kernel by scaling freqs
angeloskath Apr 28, 2026
3cf5282
Start simplifying and speeding up the attention
angeloskath Apr 29, 2026
83d7e74
Refactor compressor and compressed non-sparse attention
angeloskath Apr 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading