Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 33d5b68

Browse files
committedJul 2, 2021
mnt: re-arranged codes and added many new versions
Benchmarks should not rely on convergence, rather a fixed number of iterations is more appropriate. The problem of inconsistent iterations between the C and Fortran versions was also fixed because of an indexing error when calculating the cell coordinates (C is 0-based, F is 1-based). So they now run the same number of iterations, regardless of M. Note, this issue still persists in 2 and 3, but not in 4, I can't seem to figure out why. Moved timings about to make them more comparable. All different benchmarks write out the same format output to more easily compare them. The later versions are more realistic since they use allocate/malloc which is more appropriate. Heavily updated the README.md file to explain what is going on.
1 parent 795e462 commit 33d5b68

17 files changed

+916
-294
lines changed
 

‎poisson2d/README.md

+36-6
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,48 @@
33
This solver uses a simple Jacobi iteration to solve a Poisson equation. For details, see, for example,
44
"Computational Physics" by Mark Newman, Chap 9. Compiler commands were:
55

6-
```gfortran sourcefile.f95 -Ofast -o program```
6+
```gfortran sourcefile.f95 -O3 -o program```
77

8-
```gcc sourcefile.c -Ofast -o program```
8+
```gcc sourcefile.c -O3 -o program```
99

1010
For Cython module (import it to run):
1111

1212
```python setup.py build_ext --inplace```
1313

14+
15+
There are various implementations using various strategies.
16+
All codes are named with a 1-1 correspondance, i.e. `c_i_*` has the same strategy as `fortran_i_*` files.
17+
The suffixed `_*` is only used if there are multiple alternatives that utilizes the same strategy between
18+
the different languages. For example the `c_4_1d` and `c_4_2d` are both using the same strategy for
19+
solving the Poisson equation, while the former uses a 1D array for the data while the latter uses a 2D
20+
array with some slight overhead of another pointer level.
21+
22+
The two files `c_common.c` and `fortran_common.f90` are baseline programs used for printing
23+
out data and for parsing the command line arguments.
24+
25+
All executables accept 2 arguments:
26+
27+
# N = 100
28+
# N_ITER = 2000
29+
./c_1 100 2000
30+
100 2000 5.658391137993336e+04 1.448613750000000e-03
31+
32+
which will use a 2D array of size `100x100` and run for 2000 iterations.
33+
The output line consists of 4 values, 1) N, 2) N_ITER, 3) max-difference, 4) time per iteration.
34+
35+
Some thoughts on the code at https://runningcrocodile.fi/articles/pythonperformance.html. Also, there was
36+
discussion about this problem at Discourse: https://fortran-lang.discourse.group/t/performance-c-vs-fortran/1461
37+
with important contributions from numerous users (Mason, Beliavsky, septc, implicitall, nncarlson, han190 and pmk).
38+
39+
The entire code base has been re-written to allow command-line arguments allowing to sample different
40+
matrix sizes with the same executable (user zerothi).
41+
42+
To run a preset benchmark suite, simply execute `bash run.sh` which will run the currently
43+
implemented benchmarks with dimensions up to 600.
44+
45+
46+
## Old timings
47+
1448
The grid is a 2-dimensional array of size MxM. The timings for different values of M are, in seconds:
1549

1650
| Language | M=100 | M=200 | M=300 |
@@ -25,7 +59,3 @@ The grid is a 2-dimensional array of size MxM. The timings for different values
2559
* For all of these results, the amount of iterations performed by the respective codes was approximately
2660
the same, with the exception of the 100x100 grid C codes, which did nearly a double amount of iterations
2761
compared to the rest, for some reason. The timings are on AMD Ryzen 5 3600 @3.6GHz, using WSL2 on Windows 10.
28-
29-
Some thoughts on the code at https://runningcrocodile.fi/articles/pythonperformance.html . Also, there was
30-
discussion about this problem at Discourse: https://fortran-lang.discourse.group/t/performance-c-vs-fortran/1461
31-
with important contributions from numerous users (Mason, Beliavsky, septc, implicitall, nncarlson, han190 and pmk)

‎poisson2d/c_1.c

+110
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
#include <stdio.h>
2+
#include <stdlib.h>
3+
#include <string.h>
4+
#include <math.h>
5+
#include <time.h>
6+
#include "c_common.h"
7+
8+
// COMMENT
9+
// I know, this is extremely bad practice.
10+
// But to accommodate argument sizes for the arguments
11+
// I felt this was ok.
12+
// It makes it much more versatile when testing
13+
// END COMMENT
14+
void swap(int M, double (*a)[M], double (*b)[M])
15+
{
16+
double swp;
17+
for (int i = 0; i < M; i++) {
18+
for (int j = 0; j < M; j++) {
19+
swp = a[i][j];
20+
a[i][j] = b[i][j];
21+
b[i][j] = swp;
22+
}
23+
}
24+
}
25+
26+
// See comment above
27+
double mmax(int M, double (*phi)[M], double (*phip)[M]) {
28+
double max = 0.0;
29+
double diff = 0.0;
30+
for (int i = 0; i < M; i++) {
31+
for (int j = 0; j < M; j++) {
32+
diff = fabs(phi[i][j]-phip[i][j]);
33+
if (diff > max)
34+
max = diff;
35+
}
36+
}
37+
return max;
38+
}
39+
40+
double rho(double x, double y) {
41+
double s1 = 0.6;
42+
double e1 = 0.8;
43+
double s2 = 0.2;
44+
double e2 = 0.4;
45+
46+
if (x > s1 && x < e1 && y > s1 && y < e1) {
47+
return 1.0;
48+
} else if (x > s2 && x < e2 && y > s2 && y < e2 ) {
49+
return -1.0;
50+
} else {
51+
return 0.0;
52+
}
53+
}
54+
55+
clock_t run(int M, int N_ITER) {
56+
57+
// COMMENT:
58+
// I wouldn't even imagine any novice in C would
59+
// do something like this. It really makes no sense :)
60+
// END OF COMMENT
61+
double (*phi)[M];
62+
double (*phip)[M];
63+
double (*rhoa)[M];
64+
phi = malloc(sizeof(double[M][M]));
65+
phip = malloc(sizeof(double[M][M]));
66+
rhoa = malloc(sizeof(double[M][M]));
67+
68+
clock_t tic = clock();
69+
70+
// initialize matrices
71+
memset(phip, 0, sizeof(phip[0][0])*M*M);
72+
memset(phi, 0, sizeof(phi[0][0])*M*M);
73+
74+
double delta = 1.0;
75+
double a2 = pow(SPACE,2.0);
76+
77+
int iter = 0;
78+
while (iter < N_ITER ) {
79+
iter += 1;
80+
81+
for (int i=1; i < M-1; i++) {
82+
for (int j=1; j < M-1; j++) {
83+
phip[i][j] = (phi[i+1][j] + phi[i-1][j] +
84+
phi[i][j+1] + phi[i][j-1])/4.0 +
85+
a2/(4.0 * EPSILON0)*rho(i*SPACE,j*SPACE);
86+
}
87+
}
88+
delta = mmax(M, phi, phip);
89+
swap(M, phi, phip);
90+
91+
}
92+
93+
clock_t toc = clock();
94+
95+
free(phi);
96+
free(phip);
97+
free(rhoa);
98+
99+
write_timing(M, iter, delta, toc - tic);
100+
}
101+
102+
103+
int main(int argc, char *argv[]) {
104+
105+
int N, N_ITER;
106+
107+
// Retrieve arguments
108+
parse_args(argc, argv, &N, &N_ITER);
109+
run(N, N_ITER);
110+
}

‎poisson2d/c_2.c

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
#include <stdio.h>
2+
#include <stdlib.h>
3+
#include <string.h>
4+
#include <math.h>
5+
#include <time.h>
6+
#include "c_common.h"
7+
8+
// COMMENT
9+
// I know, this is extremely bad practice.
10+
// But to accommodate argument sizes for the arguments
11+
// I felt this was ok.
12+
// It makes it much more versatile when testing
13+
// END COMMENT
14+
void swap(int M, double a[M][M], double b[M][M]) {
15+
double swp;
16+
for (int i = 0; i < M; i++) {
17+
for (int j = 0; j < M; j++) {
18+
swp = a[i][j];
19+
a[i][j] = b[i][j];
20+
b[i][j] = swp;
21+
}
22+
}
23+
}
24+
25+
// See comment above
26+
double mmax(int M, double a[M][M], double b[M][M]) {
27+
double max = 0.0;
28+
double diff = 0.0;
29+
for (int i = 0; i < M; i++) {
30+
for (int j = 0; j < M; j++) {
31+
diff = fabs(a[i][j]-b[i][j]);
32+
if (diff > max)
33+
max = diff;
34+
}
35+
}
36+
return max;
37+
}
38+
39+
double rho(double x, double y) {
40+
double s1 = 0.6;
41+
double e1 = 0.8;
42+
double s2 = 0.2;
43+
double e2 = 0.4;
44+
45+
if (x > s1 && x < e1 && y > s1 && y < e1) {
46+
return 1.0;
47+
} else if (x > s2 && x < e2 && y > s2 && y < e2 ) {
48+
return -1.0;
49+
} else {
50+
return 0.0;
51+
}
52+
}
53+
54+
void run(int M, int N_ITER) {
55+
56+
double phi[M][M];
57+
double phip[M][M];
58+
59+
clock_t tic = clock();
60+
int iter = 0;
61+
62+
// initialize matrices
63+
memset(&phip[0][0], 0, sizeof(double)*M*M);
64+
memset(&phi[0][0], 0, sizeof(double)*M*M);
65+
66+
double delta;
67+
double a2 = pow(SPACE,2.0);
68+
69+
while (iter < N_ITER ) {
70+
iter += 1;
71+
72+
for (int i=1; i < M-1; i++) {
73+
for (int j=1; j < M-1; j++) {
74+
phip[i][j] = (phi[i+1][j] + phi[i-1][j] +
75+
phi[i][j+1] + phi[i][j-1])/4.0 +
76+
a2/(4.0 * EPSILON0)*rho(i*SPACE,j*SPACE);
77+
}
78+
}
79+
delta = mmax(M, phi, phip);
80+
swap(M, phi, phip);
81+
82+
}
83+
84+
clock_t toc = clock();
85+
86+
write_timing(M, iter, delta, toc - tic);
87+
}
88+
89+
90+
int main(int argc, char *argv[]) {
91+
92+
int N, N_ITER;
93+
94+
// Retrieve arguments
95+
parse_args(argc, argv, &N, &N_ITER);
96+
97+
run(N, N_ITER);
98+
}

‎poisson2d/c_3.c

+104
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
#include <stdio.h>
2+
#include <stdlib.h>
3+
#include <string.h>
4+
#include <math.h>
5+
#include <time.h>
6+
#include "c_common.h"
7+
8+
// COMMENT
9+
// I know, this is extremely bad practice.
10+
// But to accommodate argument sizes for the arguments
11+
// I felt this was ok.
12+
// It makes it much more versatile when testing
13+
// END COMMENT
14+
void swap(int M, double a[M][M], double b[M][M]) {
15+
double swp;
16+
for (int i = 0; i < M; i++) {
17+
for (int j = 0; j < M; j++) {
18+
swp = a[i][j];
19+
a[i][j] = b[i][j];
20+
b[i][j] = swp;
21+
}
22+
}
23+
}
24+
25+
// See comment above
26+
double mmax(int M, double a[M][M], double b[M][M]) {
27+
double max = 0.0;
28+
double diff = 0.0;
29+
for (int i = 0; i < M; i++) {
30+
for (int j = 0; j < M; j++) {
31+
diff = fabs(a[i][j]-b[i][j]);
32+
if (diff > max)
33+
max = diff;
34+
}
35+
}
36+
return max;
37+
}
38+
39+
double rho(double x, double y) {
40+
double s1 = 0.6;
41+
double e1 = 0.8;
42+
double s2 = 0.2;
43+
double e2 = 0.4;
44+
45+
if (x > s1 && x < e1 && y > s1 && y < e1) {
46+
return 1.0;
47+
} else if (x > s2 && x < e2 && y > s2 && y < e2 ) {
48+
return -1.0;
49+
} else {
50+
return 0.0;
51+
}
52+
}
53+
54+
void run(int M, int N_ITER) {
55+
56+
double phi[M][M];
57+
double phip[M][M];
58+
double rhoa[M][M];
59+
60+
clock_t tic = clock();
61+
int iter = 0;
62+
63+
// initialize matrices
64+
memset(&phip[0][0], 0, sizeof(double)*M*M);
65+
memset(&phi[0][0], 0, sizeof(double)*M*M);
66+
for (int i=1; i<M-1; i++) {
67+
for (int j=1; j<M-1; j++) {
68+
rhoa[i][j] = rho(i*SPACE,j*SPACE);
69+
}
70+
}
71+
72+
double delta;
73+
double a2 = pow(SPACE,2.0);
74+
75+
while (iter < N_ITER ) {
76+
iter += 1;
77+
78+
for (int i=1; i < M-1; i++) {
79+
for (int j=1; j < M-1; j++) {
80+
phip[i][j] = (phi[i+1][j] + phi[i-1][j] +
81+
phi[i][j+1] + phi[i][j-1])/4.0 +
82+
a2/(4.0 * EPSILON0)*rhoa[i][j];
83+
}
84+
}
85+
delta = mmax(M, phi, phip);
86+
swap(M, phi, phip);
87+
88+
}
89+
90+
clock_t toc = clock();
91+
92+
write_timing(M, iter, delta, toc - tic);
93+
}
94+
95+
96+
int main(int argc, char *argv[]) {
97+
98+
int N, N_ITER;
99+
100+
// Retrieve arguments
101+
parse_args(argc, argv, &N, &N_ITER);
102+
103+
run(N, N_ITER);
104+
}

0 commit comments

Comments
 (0)
Please sign in to comment.