Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runners #546

Open
wants to merge 236 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 88 commits
Commits
Show all changes
236 commits
Select commit Hold shift + click to select a range
7dc0a74
trigger test
AlexCheema Dec 1, 2024
f339f74
trigger test
AlexCheema Dec 1, 2024
71db641
trigger test
AlexCheema Dec 2, 2024
f0bb515
trigger test
AlexCheema Dec 2, 2024
f94c906
trigger test
AlexCheema Dec 4, 2024
0d0338f
migrate from circleci to github actions
AlexCheema Dec 6, 2024
972aea4
macos 15
AlexCheema Dec 6, 2024
902e0d3
github env vars
AlexCheema Dec 6, 2024
676125b
job
AlexCheema Dec 6, 2024
8d433e6
run tinygrad and discovery integratrion tests on linux
AlexCheema Dec 6, 2024
3c0297c
more robust discovery log check
AlexCheema Dec 6, 2024
58bcf5b
check discovery on integration tests too
AlexCheema Dec 6, 2024
6b54188
cond
AlexCheema Dec 6, 2024
32cd1f1
give this a goh
AlexCheema Dec 6, 2024
9dc76ef
tooonygrad
AlexCheema Dec 6, 2024
976e5f2
disable mlx test for now..plan to run this on a self-hosted runner
AlexCheema Dec 6, 2024
deb80d2
clang for tinygrad
AlexCheema Dec 6, 2024
8302fd0
test runner
Dec 6, 2024
cb3d89e
test runner
Dec 6, 2024
7d223a0
matrix
Dec 6, 2024
90fd5c1
matrix
Dec 6, 2024
d154d37
add exo run
Dec 6, 2024
bdf417f
tweak
Dec 6, 2024
6b61fc6
tweak python install
Dec 6, 2024
1af28cb
fix
Dec 6, 2024
ce2ccdd
fix2
Dec 6, 2024
f9c2361
fix3
Dec 6, 2024
d16280d
debug
Dec 6, 2024
0739dc9
fix
Dec 6, 2024
3662ec4
fix
Dec 6, 2024
1dcc731
fix
Dec 6, 2024
ccc5415
try
Dec 6, 2024
64954aa
fixed
Dec 6, 2024
c3dfac6
debug
Dec 6, 2024
f7e0348
activate
Dec 6, 2024
19a7d5a
fix
Dec 6, 2024
cb3c147
fix
Dec 6, 2024
4cac1bb
quotes
Dec 6, 2024
faf0aae
jq
Dec 6, 2024
16b126d
fix
Dec 6, 2024
f087c0a
fix
Dec 6, 2024
9fc3358
path
Dec 6, 2024
acdee16
debug
Dec 6, 2024
4dd617a
shorter
Dec 6, 2024
6c08b32
nodebug
Dec 6, 2024
7b77ef0
flush
Dec 6, 2024
6dae3a4
conf
Dec 7, 2024
320892d
maxtok
Dec 7, 2024
7857103
aws
Dec 7, 2024
732ba91
new_conf
Dec 8, 2024
38bd003
fix
Dec 8, 2024
c138de0
job_name
Dec 8, 2024
c3c80c6
name
Dec 8, 2024
fe80749
fix
Dec 8, 2024
be8cbc0
trigger test
AlexCheema Dec 8, 2024
fb44eb0
simplify bench
AlexCheema Dec 8, 2024
755dd47
jobname
Dec 8, 2024
87865f0
list exo processes before test, warmup req in bench
AlexCheema Dec 8, 2024
fb8d870
t
AlexCheema Dec 8, 2024
c8f9372
model matrix
Dec 8, 2024
6bb7c11
enable debug
AlexCheema Dec 8, 2024
3687ba1
bench logs
AlexCheema Dec 8, 2024
3ccbdf1
add DEBUG_DISCOVERY
AlexCheema Dec 8, 2024
8e57f33
trigger test
AlexCheema Dec 8, 2024
b4f8649
bootstrap
Dec 8, 2024
903a5aa
fix
Dec 8, 2024
1716f63
test
Dec 8, 2024
b0977f9
t
AlexCheema Dec 8, 2024
cbac4d6
git version
AlexCheema Dec 8, 2024
fd05bca
lfs
AlexCheema Dec 8, 2024
f584e86
get rid of lfs stuff
AlexCheema Dec 8, 2024
b216819
remove
Dec 8, 2024
571b26c
allowed interface types
AlexCheema Dec 8, 2024
bd9d118
sleep before bench
AlexCheema Dec 8, 2024
b4e885b
test range
AlexCheema Dec 8, 2024
314a5d9
test 1
AlexCheema Dec 8, 2024
f6c2c37
test 2
AlexCheema Dec 8, 2024
e78a52d
test 3
AlexCheema Dec 8, 2024
cc74b1f
test 4
AlexCheema Dec 8, 2024
b69cb49
test 5
AlexCheema Dec 8, 2024
d93b8e8
test 6
AlexCheema Dec 8, 2024
af6048e
test 7
AlexCheema Dec 8, 2024
9ba8bbd
test 8
AlexCheema Dec 8, 2024
3cf28f8
test 9
AlexCheema Dec 8, 2024
38eaecf
test 10
AlexCheema Dec 8, 2024
e78ef75
test 11
AlexCheema Dec 8, 2024
d714e40
test 12
AlexCheema Dec 8, 2024
286db87
test 13
AlexCheema Dec 8, 2024
a4b221d
test 14
AlexCheema Dec 8, 2024
3108434
test 15
AlexCheema Dec 8, 2024
8c7c156
test 16
AlexCheema Dec 8, 2024
4d6af6e
test 17
AlexCheema Dec 8, 2024
29d9df0
test 18
AlexCheema Dec 8, 2024
53edb85
test 19
AlexCheema Dec 8, 2024
8a5d212
test 20
AlexCheema Dec 8, 2024
5a4d128
trigger test
AlexCheema Dec 9, 2024
1e869a0
trigger test
AlexCheema Dec 10, 2024
8269b4b
t
AlexCheema Dec 11, 2024
16d9839
test {i}
AlexCheema Dec 11, 2024
4f4ac0f
test 21
AlexCheema Dec 11, 2024
6030b39
test 22
AlexCheema Dec 11, 2024
23dd5de
test 23
AlexCheema Dec 11, 2024
5d3be3c
test 24
AlexCheema Dec 11, 2024
fc26ad4
test 25
AlexCheema Dec 11, 2024
070b163
test 26
AlexCheema Dec 11, 2024
949055d
test 27
AlexCheema Dec 11, 2024
04bc163
test 28
AlexCheema Dec 11, 2024
0e32a62
test 29
AlexCheema Dec 11, 2024
18e7919
test 30
AlexCheema Dec 11, 2024
23158a4
add branch name to results
AlexCheema Dec 11, 2024
a84cba4
Merge remote-tracking branch 'origin/main' into runners
AlexCheema Dec 11, 2024
afe71c0
check gpu usage
AlexCheema Dec 11, 2024
cb40eb2
more robust configure_mlx.sh
AlexCheema Dec 11, 2024
ba96413
bootstrap script tweaks
AlexCheema Dec 11, 2024
e2d3a90
runner-token typo
AlexCheema Dec 11, 2024
c938efb
t
AlexCheema Dec 11, 2024
f7122d4
add system_status check to bench
AlexCheema Dec 11, 2024
cff03fc
perf diag
AlexCheema Dec 11, 2024
bbb5846
Test on m4
AlexCheema Dec 11, 2024
6169996
test
AlexCheema Dec 11, 2024
b7bab80
test2
AlexCheema Dec 11, 2024
41902f7
tweaks
AlexCheema Dec 11, 2024
e501eea
tweak install
AlexCheema Dec 11, 2024
668766f
t
AlexCheema Dec 11, 2024
3b1ea19
use .venv exo
AlexCheema Dec 11, 2024
7b2282d
run without debug flag
AlexCheema Dec 11, 2024
e680e8a
fix name
AlexCheema Dec 11, 2024
3789758
t
AlexCheema Dec 11, 2024
9848a45
TT
AlexCheema Dec 11, 2024
7b99cb4
t
AlexCheema Dec 11, 2024
a4bb4bb
update bootstrap
AlexCheema Dec 11, 2024
9dd33d3
t
AlexCheema Dec 11, 2024
8d9e3b8
t
AlexCheema Dec 11, 2024
1dbe11c
t
AlexCheema Dec 11, 2024
6bb3893
tt
AlexCheema Dec 11, 2024
0904cda
ttt
AlexCheema Dec 11, 2024
cacf50c
tttt
AlexCheema Dec 11, 2024
739b7d1
tttttt
AlexCheema Dec 11, 2024
7c0c5ef
ttttttt
AlexCheema Dec 11, 2024
63da9fc
a
AlexCheema Dec 11, 2024
d6c2146
t
AlexCheema Dec 11, 2024
9a11e27
ttt
AlexCheema Dec 11, 2024
97ffb83
t
AlexCheema Dec 11, 2024
d95f40b
a
AlexCheema Dec 11, 2024
cdae702
t
AlexCheema Dec 11, 2024
a932afc
oi
AlexCheema Dec 11, 2024
b1142d4
t
AlexCheema Dec 11, 2024
6acfb81
t
AlexCheema Dec 11, 2024
5dee5e5
t
AlexCheema Dec 11, 2024
26351e7
t
AlexCheema Dec 11, 2024
e698ef6
t
AlexCheema Dec 11, 2024
61c0963
t
AlexCheema Dec 11, 2024
dd3fd27
t
AlexCheema Dec 11, 2024
5a1a0f5
t
AlexCheema Dec 11, 2024
6cf2af3
t
AlexCheema Dec 11, 2024
9067741
t
AlexCheema Dec 11, 2024
d0b7f1b
t
AlexCheema Dec 11, 2024
741c318
test
AlexCheema Dec 11, 2024
6249bee
tes
AlexCheema Dec 11, 2024
225dcba
t
AlexCheema Dec 11, 2024
92edfa5
t
AlexCheema Dec 11, 2024
83470a9
t
AlexCheema Dec 11, 2024
83892d5
t
AlexCheema Dec 11, 2024
20e3065
les goh
AlexCheema Dec 11, 2024
e63c224
testtt
AlexCheema Dec 11, 2024
3f6ef1c
single node test 1
AlexCheema Dec 11, 2024
fe506a5
single node test 2
AlexCheema Dec 11, 2024
fb7a0de
single node test 3
AlexCheema Dec 11, 2024
6f097c9
single node test 4
AlexCheema Dec 11, 2024
f89b85b
single node test 5
AlexCheema Dec 11, 2024
8b47a9d
single node test 6
AlexCheema Dec 11, 2024
b23c3fd
single node test 7
AlexCheema Dec 11, 2024
32ff3ef
single node test 8
AlexCheema Dec 11, 2024
9f1393d
single node test 9
AlexCheema Dec 11, 2024
c5c27a3
single node test 10
AlexCheema Dec 11, 2024
6c322ac
single node test 11
AlexCheema Dec 11, 2024
3fda05a
single node test 12
AlexCheema Dec 11, 2024
f22bc99
single node test 13
AlexCheema Dec 11, 2024
0bd44c0
single node test 14
AlexCheema Dec 11, 2024
c65d1d9
single node test 15
AlexCheema Dec 11, 2024
8408c84
single node test 16
AlexCheema Dec 11, 2024
76196b8
single node test 17
AlexCheema Dec 11, 2024
92e2b74
single node test 18
AlexCheema Dec 11, 2024
279354a
single node test 19
AlexCheema Dec 11, 2024
bba0aa0
single node test 20
AlexCheema Dec 11, 2024
8cb7327
re-enable m4 cluster run
AlexCheema Dec 12, 2024
1194db6
m3
AlexCheema Dec 12, 2024
8c6d37d
m4 cluster test
AlexCheema Dec 12, 2024
f9f7612
better bench system info
AlexCheema Dec 12, 2024
eeecdcb
try a different taskpolicy
AlexCheema Dec 12, 2024
2abe57b
grasping at straws
AlexCheema Dec 12, 2024
dbb7ad3
run with three m4 pro
AlexCheema Dec 12, 2024
9472ab0
t
AlexCheema Dec 12, 2024
b6f2385
run llama-3.1-8b on 3 m4 pro cluster
AlexCheema Dec 12, 2024
2ff4638
Merge remote-tracking branch 'origin/main' into runners
AlexCheema Dec 12, 2024
e5d54c7
add llama-3.3-70b to 3 M4 Pro cluster
AlexCheema Dec 12, 2024
0c6ab35
increase timeout of http request in bench.py up to 10 mins
AlexCheema Dec 14, 2024
a930921
set max-generate-tokens to 250
AlexCheema Dec 14, 2024
25b4af7
Merge branch 'main' into runners
Dec 14, 2024
f55a53a
one token at a time
AlexCheema Dec 14, 2024
cb4615c
fix SendNewToken
AlexCheema Dec 14, 2024
06c2e23
rip out stats bloat
AlexCheema Dec 14, 2024
08912d1
Only collect topology if peers changed
blindcrone Dec 15, 2024
9397464
add commit to results
AlexCheema Dec 15, 2024
64365d6
one two and three m4 pro clusters
AlexCheema Dec 15, 2024
c9ded9b
optimise networking, remove bloat
AlexCheema Dec 16, 2024
804ad47
upgrade mlx
AlexCheema Dec 16, 2024
063964a
remove redundant sample_logits, put back opaque status for process_pr…
AlexCheema Dec 16, 2024
c0534b6
Merge commit: trigger test
AlexCheema Dec 16, 2024
bfa06ee
Merge commit: trigger test
AlexCheema Dec 16, 2024
bf1aafd
Merge commit: trigger test
AlexCheema Dec 16, 2024
41eaaec
Merge commit: trigger test
AlexCheema Dec 16, 2024
b49c4ca
Merge commit: trigger test
AlexCheema Dec 16, 2024
427d071
Merge commit: trigger test
AlexCheema Dec 16, 2024
34ecbbe
Merge commit: trigger test
AlexCheema Dec 16, 2024
bd0febe
Merge commit: trigger test
AlexCheema Dec 16, 2024
99a70f1
Merge commit: trigger test
AlexCheema Dec 16, 2024
8d94b8a
trigger test
AlexCheema Dec 16, 2024
35d90d9
Merge remote-tracking branch 'origin/main' into runners
AlexCheema Dec 16, 2024
b17faa8
dont broadcast every single process_tensor
AlexCheema Dec 16, 2024
036224f
add topology to tinychat ui
AlexCheema Dec 16, 2024
1b14be6
make device_capabilities async running on a thread pool
AlexCheema Dec 16, 2024
e2474c3
fail if we never get the desired node count
AlexCheema Dec 16, 2024
58f0a0f
optimise grpc parameters
AlexCheema Dec 17, 2024
0a07223
switch to uvloop (faster asyncio event loop) and optimise grpc settings
AlexCheema Dec 17, 2024
3a58576
make sure this is actually doing something
AlexCheema Dec 17, 2024
1f108a0
remove test sleep
AlexCheema Dec 17, 2024
198308b
more robust udp broadcast
AlexCheema Dec 17, 2024
7ac4004
change it back to collecting topology periodically even if peers dont…
AlexCheema Dec 17, 2024
2f0b543
add peer connection info to tinychat
AlexCheema Dec 17, 2024
023ddc2
support different network interface tests
AlexCheema Dec 17, 2024
db010d5
distributed tracing
AlexCheema Dec 18, 2024
165a9e1
more granular tracing
AlexCheema Dec 18, 2024
b02c0a5
new approach to mlx async operations and make tokenizer operations as…
AlexCheema Dec 18, 2024
0278de7
noop tracing
AlexCheema Dec 18, 2024
c609f39
disable the m3 max 128GB test
AlexCheema Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
346 changes: 0 additions & 346 deletions .circleci/config.yml

This file was deleted.

Loading
Loading