Skip to content

Commit d4e4f99

Browse files
committed
draft backup: 68664e664cd9ccc16a846534
1 parent 916bd61 commit d4e4f99

File tree

1 file changed

+134
-0
lines changed

1 file changed

+134
-0
lines changed

draft-68664e664cd9ccc16a846534.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: "Rayon, performance without knowing"
3+
slug: rayon-performance-without-knowing
4+
5+
---
6+
7+
---
8+
9+
## Why Performance Is Hard
10+
11+
I wanted to speed things up in Rust and let's be honest, *threads are one of the best tools to improve performance in Rust*.. But using the Tokio crate can be quite unintuitive and difficult to use, all those await and features...
12+
13+
Here's where [rayon](https://docs.rs/rayon/latest/rayon/index.html) helps us, allows us to parallelize tasks without having to think about threads. It's simple to use, fast, lightweight, and just works.
14+
15+
---
16+
17+
### A simple definition
18+
19+
Rayon is a library that helps you run code in parallel, making it easy to turn slow, step-by-step computations into faster ones that use multiple CPU cores.
20+
21+
It's a small and easy-to-use tool that lets you add parallelism. It makes sure your code runs safely without data races, and it only uses parallelism when it makes sense, depending on the amount of work at runtime.
22+
23+
For example, we can simply turn this line:
24+
25+
```rust
26+
let results: Vec<_> = data.iter().map(|x| x.do_something()).collect();
27+
```
28+
29+
into:
30+
31+
```rust
32+
use rayon::prelude::*;
33+
34+
let results: Vec<_> = data.par_iter().map(|x| x.do_something()).collect();
35+
```
36+
37+
Using the [prelude](https://docs.rs/rayon/latest/rayon/prelude/index.html) is the easiest way to do parallelism using rayon in rust.
38+
39+
---
40+
41+
## Let's break down performance
42+
43+
### Without rayon
44+
45+
I ran the following code which iterates from 1 to 1,000,000, computes the cube (`x.pow(3)`) and the square (`x.pow(2)`) of each number, takes the remainder of both results using modulo `97,531`, then sums those two remainders. I ran it using `cargo run` without any optimization: `Finished dev profile [unoptimized + debuginfo] target(s) in 0.86s` Running target\\debug\\ry.exe `2, 12, 36, 80, 150, 252, 392, 576, 810, 1100`
46+
47+
These are the CPU specs: CPU Name: Intel(R) microarchitecture code named Alderlake-S Frequency: 2.5 GHz Logical CPU Count: 12
48+
49+
```rust
50+
fn main() {
51+
let data: Vec<u64> = (1..1_000_000).collect();
52+
let results: Vec<u64> = data.iter()
53+
.map(|x| x.pow(3) % 97_531 + x.pow(2) % 97_531)
54+
.collect();
55+
println!("{:?}", &results[..10]);
56+
}
57+
```
58+
59+
I measured performance using Intel Vtune profiler and we can see that without using rayon it needs 0.041s using 1 single thread
60+
61+
![](https://cdn.hashnode.com/res/hashnode/image/upload/v1751554621303/a63bf8c8-dea1-4e6b-bc4f-fe829b14ec83.png align="left")
62+
63+
and the function which needs more time is the main, because we iterate, calculate and collect the result
64+
65+
![](https://cdn.hashnode.com/res/hashnode/image/upload/v1751554626742/902f9d41-7766-4d93-8ef9-b5d4f2547d39.png align="left")
66+
67+
---
68+
69+
### With rayon
70+
71+
The computation level is the same as before, but this time we use rayon:
72+
73+
```rust
74+
use rayon::prelude::*;
75+
76+
fn main() {
77+
let data: Vec<u64> = (1..1_000_000).collect();
78+
let results: Vec<u64> = data.par_iter()
79+
.map(|x| x.pow(3) % 97_531 + x.pow(2) % 97_531)
80+
.collect();
81+
println!("{:?}", &results[..10]);
82+
}
83+
```
84+
85+
adding `rayon = "1.10.0"` to your Cargo.toml dependencies
86+
87+
I compiled without optimizations:
88+
89+
`Finished dev profile [unoptimized + debuginfo] target(s) in 0.02s`
90+
91+
`Running target\debug\ry.exe`
92+
93+
`[2, 12, 36, 80, 150, 252, 392, 576, 810, 1100]`
94+
95+
Already now we can see that the program ran in 0.02 seconds, compared to 0.86 seconds without Rayon, but let's see in detail:
96+
97+
![](https://cdn.hashnode.com/res/hashnode/image/upload/v1751554640610/e7a8827b-275c-4f3d-bdbb-8ae50d041b68.png align="left")
98+
99+
* First, we can see it uses 8 threads instead of just one
100+
101+
* We see that it took 0.029 seconds instead of 0.041s
102+
103+
* CPU status is constantly in idle mode instead of poor as before
104+
105+
106+
As before all the effective time is used by one single function which is the last called
107+
108+
![](https://cdn.hashnode.com/res/hashnode/image/upload/v1751554746676/05d593a4-0be2-4b63-8515-89f6274be6e5.png align="left")
109+
110+
---
111+
112+
## When (and When Not) to Use Rayon
113+
114+
The ideal use cases are *CPU-bound work, large datasets, pure functions, sorting, etc.*
115+
116+
Instead for *small workloads, shared mutable state or I/O-heavy tasks* it's better to use the [Tokio](https://docs.rs/tokio/latest/tokio/) runtime if you really need it. The [Tokio module](https://docs.rs/tokio/latest/tokio/#modules) supports `fs, time, command execution, net` and a lot more using multithreading, but that's another topic I'll write about...
117+
118+
---
119+
120+
### Other stuff Rayon does
121+
122+
Beyond `.map` and `.par_iter` Rayon also includes `.filter()`, `.reduce()`, `.for_each()`, `join()` for parallel sorting
123+
124+
---
125+
126+
## To sum up
127+
128+
`Rayon` isn't always the best choice. Still, it's a **smart and safe way** to add parallelism. It helps you **scale workloads with minimal code changes**, making it a solid choice for performance-critical applications.
129+
130+
\`💡 Got another crate in mind?
131+
132+
## ☕ Was this helpful?
133+
134+
Treat me to a coffee on Ko-fi [https://ko-fi.com/riccardoadami](https://ko-fi.com/riccardoadami)

0 commit comments

Comments
 (0)