How can i compile a LLM model? #1959
Replies: 1 comment
-
I have another question regarding performance testing of compile. I ran the tutorial code as folowing on my M4 MAX. import mlx.core as mx
import mlx.nn as nn
import time
def timeit(fun, x, name=""):
# warm up
for _ in range(10):
mx.eval(fun(x))
tic = time.perf_counter()
for _ in range(100):
mx.eval(fun(x))
toc = time.perf_counter()
tpi = 1e3 * (toc - tic) / 100
print(f"{name} Time per iteration {tpi:.3f} (ms)")
x = mx.random.uniform(shape=(32, 1000, 4096))
timeit(nn.gelu, x, "No Compile")
timeit(mx.compile(nn.gelu), x, "Compiled") According to the tutorial, a 5x speedup is achieved on the M1 MAX. However, my results on the M4 MAX show different performance characteristics. No Compile Time per iteration 8.267 (ms)
Compiled Time per iteration 2.869 (ms) I'm curious about what exactly influences the performance difference between the compiled and non-compiled versions. Could someone shed some light on this? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I'm new to MLX and would like to know if MLX supports compiling a model and performing inference, similar to how it's done in PyTorch. If so, could someone please guide me on how to achieve this?
Thanks in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions