-
Notifications
You must be signed in to change notification settings - Fork 38
/
Copy pathLatentParallelismNotes.txt
83 lines (56 loc) · 2.23 KB
/
LatentParallelismNotes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Present trajectory of a program.
Identify potential latent parallelism.
Assume we can exploit it, present updated trajectory.
Amdahl's law.
Issues in latent parallelism:
Dependencies
# Clean, assuming f is "pure"
for i in range(N):
a[i] = f(i)
# Dependency, but fixable (likewise for min, max, etc.)
s = 0
for i in range(N):
s += f(i)
# FP order of operation issues.
# Needs careful review
for i in range(N):
a[i] = f(i, a)
Implicit ordering of output
Random numbers present two challenges:
- Different: Don't want the same sequence in each concurrent evaluation of a Monte Carlo code
- Same: Reproducibility, ideally for both sequential and parallel runs. Much easier to debug if
we expect the same answer from run to run.
- Need to understand the default behavior and control mechanisms for your RNG.
Sequential code example.
Setting up multiple entities:
Appropriate context
Depends on the nature of the entity (threads, child processes, independent processes: DS)
Independent processes offer the most scalability, but how to establish context?
E.g., create something that knows our test functions?
Division of labor
Who does what?
Aside: Coordination frameworks (MPI, KVS, memcached, ...) Consciously
avoiding this issue at the moment. What if we wanted the output
ordered correctly?
Basic owner computes example.
- Review the functions
- What's sort of iterative structures are present?
- Where to put own()?
Comments on performance:
seq 275s
Need to compare sequential code with (rank 0, ranks 1) case.
1 289 (37)
Runs to demonstrate speed up, as well as diminishing returns.
2 146 (18, 19)
4 75 (9, 9, 9, 10)
8 38 (4, 4, 4, 5, 5, 5, 5, 5)
Load balancing (Amdahl's law revisited). Model is
MaxLoad*Time(0/1)/37. Other sources of imbalance.
Obs Ideal Model
2 => 146 137 148
4 => 75 69 78
8 => 38 34 39
Second run: 274, 149, 75, 38 (Seq +/- 5s)
Moral: For practical parallelism, benchmarking is a necessary
sanity check, but don't obsess.
Controlling coordination costs (where else could "own" be placed?)