Skip to content

Commit 4f9407a

Browse files
committed
Added OpenMP scenarios; corrected non SIMD C code
1 parent 943b746 commit 4f9407a

File tree

4 files changed

+190
-31
lines changed

4 files changed

+190
-31
lines changed

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@
44
# Ignore specific directory
55
/_Inline/
66

7-
# Ignore specific file
8-
addArrayofIntegers_C.pl
7+
# Ignore specific files
8+

README.md

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@ This is probably one of the things that should never be allowed to exist, but wh
44

55
## x86-64 examples
66

7-
### addIntegers.pl
7+
### Adding Two Integers
8+
##### Script: addIntegers.pl
89
Simple integer addition in Perl - this is the Hellow World version of this git repo
910

10-
### addArrayofIntegers.pl
11+
### The sum of an array of Integers
12+
##### Scripts: addArrayofIntegers.pl & addArrayOfIntegers\_C.pl
1113
Explore multiple equivalent ways to add *large* arrays of short integers (-100 to 100 in this implementat) in Perl:
1214
* ASM\_blank : tests the speed of calling ASM from Perl (no computations are done)
1315
* ASM : passes the integers as bytes and then uses conversion operations and scalar floating point addition
@@ -54,19 +56,11 @@ And here are the timings!
5456
|PDL\_wo\_alloc | 9.2e-04| 9.0e-04| 3.9e-05|
5557

5658
Let's say we wanted to do this toy experiment in pure C (using Inline::C of course!)
59+
This code obtains the integers as a packed "string" of doubles and forms the sum in C
5760
```C
58-
double * array = NULL;
59-
void double_alloc(size_t num_elements) {
60-
array = malloc(num_elements * sizeof(double));
61-
for (int i = 0; i < num_elements; i++) {
62-
array[i] = rand() % 200 - 100;
63-
}
64-
}
65-
void double_free() {
66-
free(array);
67-
}
68-
double sum_array_C(char *array, size_t length) {
61+
double sum_array_C(char *array_in, size_t length) {
6962
double sum = 0.0;
63+
double * array = (double *) array_in;
7064
for (size_t i = 0; i < length; i++) {
7165
sum += array[i];
7266
}
@@ -78,8 +72,23 @@ Here are the timing results:
7872
7973
| | mean | median | stddev |
8074
|------------------------------|--------|--------|--------|
81-
|C\_doubles\_w\_alloc |1.3e-02 |1.3e-02 | 2.8e-04|
82-
|C\_doubles\_wo\_alloc |1.3e-03 |1.3e-03 | 5.1e-05|
75+
|C\_doubles\_w\_alloc |4.1e-03 |4.1e-03 | 2.3e-04|
76+
|C\_doubles\_wo\_alloc |9.0e-04 |8.7e-04 | 4.6e-05|
77+
78+
79+
What if we used SIMD directives and parallel loop constructs in OpenMP? This was done in
80+
the file addArrayOfIntegers\_C.pl. All three combinations were tested, i.e. SIMD directives
81+
alone (the C equivalent of the AVX code), OpenMP parallel loop threads and SIMD+OpenMP.
82+
Here are the timings!
83+
84+
| | mean | median | stddev |
85+
|------------------------------|--------|--------|--------|
86+
|C\_OMP\_w\_alloc |4.0e-03 | 3.7e-03| 1.4e-03|
87+
|C\_OMP\_wo\_alloc |3.1e-04 | 2.3e-04| 9.5e-04|
88+
|C\_SIMD\_OMP\_w\_alloc |4.0e-03 | 3.8e-03| 8.6e-04|
89+
|C\_SIMD\_OMP\_wo\_alloc |3.1e-04 | 2.5e-04| 8.5e-04|
90+
|C\_SIMD\_w\_alloc |4.1e-03 | 4.0e-03| 2.4e-04|
91+
|C\_SIMD\_wo\_alloc |5.0e-04 | 5.0e-04| 8.9e-05|
8392
8493
#### Discussion of the addArrayofIntegers.pl example
8594
* For calculations such as this, the price that must be paid is all in memory currency: it
@@ -88,7 +97,11 @@ time dominates the numeric calculation time.
8897
* Look how insanely effective sum in List::Util is : even though it has to walk the Perl
8998
array whose elements (the *doubles*, not the AV*) are not stored in a contiguous area in memory,
9099
it is no more than 3x slower than the equivalent C code C\_doubles\_wo\_alloc.
91-
* Look how optimized PDL is compared to the C code for both memory scenarios.
100+
* Look how optimized PDL is compared to the C code in the scenario without memory allocation.
101+
* Manual SIMD coded in assembly is 40% faster than the equivalent SIMD code in OpenMP (but it is
102+
much more painful to write)
103+
* The threaded OpenMP version achieved equivalent performance to the single thread AVX assembly
104+
programs, with no obvious improvement from combining SIMD+parallel loop for pragmas in OpenMP.
92105
* For the example considered here, it thus makes ZERO senso to offload a calculation as simple as a
93106
summation because ListUtil is already within 15% of the assembly solution (at a latter iteration
94107
we will also test AVX2 and AVX512 packed addition to see if we can improve the results).

addArrayofIntegers.pl

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636

3737
my $ndarray = pdl( \@array );
3838

39+
3940
$benchmark->add_instance(
4041
'ASM_wo_alloc' => sub {
4142
sum_array( $array_byte_ASM, scalar @array );
@@ -76,10 +77,9 @@
7677
);
7778
$benchmark->add_instance(
7879
'C_doubles_w_alloc' => sub {
79-
double_alloc( scalar @array );
80+
my $array_double_ASM = pack "d*", @array;
8081
sum_array_C( $array_double_ASM, scalar @array );
8182
},
82-
double_free()
8383
);
8484
$benchmark->add_instance( 'C_doubles_wo_alloc' =>
8585
sub { sum_array_C( $array_double_ASM, scalar @array ) }, );
@@ -136,18 +136,10 @@
136136

137137
__DATA__
138138
__C__
139-
double * array = NULL;
140-
void double_alloc(size_t num_elements) {
141-
array = malloc(num_elements * sizeof(double));
142-
for (int i = 0; i < num_elements; i++) {
143-
array[i] = rand() % 200 - 100;
144-
}
145-
}
146-
void double_free() {
147-
free(array);
148-
}
149-
double sum_array_C(char *array, size_t length) {
139+
140+
double sum_array_C(char *array_in, size_t length) {
150141
double sum = 0.0;
142+
double * array = (double *) array_in;
151143
for (size_t i = 0; i < length; i++) {
152144
sum += array[i];
153145
}

addArrayofIntegers_C.pl

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
#!/home/chrisarg/perl5/perlbrew/perls/current/bin/perl
2+
use v5.36;
3+
4+
use List::Util qw(sum);
5+
use Benchmark::CSV;
6+
use PDL::Lite;
7+
use PDL::NiceSlice;
8+
use PDL::IO::Misc;
9+
use PDL::IO::CSV ':all';
10+
use PDL::Stats::Basic;
11+
use Time::HiRes qw(time);
12+
use OpenMP::Environment;
13+
use Inline (
14+
C => 'DATA',
15+
ccflagsex => q{-fopenmp},
16+
lddlflags => join( q{ }, $Config::Config{lddlflags}, q{-fopenmp} ),
17+
myextlib => ''
18+
);
19+
20+
my $openmp_env = OpenMP::Environment->new;
21+
$openmp_env->omp_num_threads(16);
22+
23+
my $benchmark = Benchmark::CSV->new(
24+
output => './addArrayofIntegers.csv',
25+
sample_size => 1,
26+
);
27+
28+
my $num_elements = 1_000_003;
29+
30+
## Create an array of $num_elements random integers, between -100, 100
31+
my @array = map { int( rand(200) ) - 100 } 1 .. $num_elements;
32+
my $array_double_ASM = pack "d*", @array;
33+
34+
35+
say "Starting benchmark";
36+
$benchmark->add_instance(
37+
'C_SIMD_wo_alloc' => sub {
38+
sum_array_SIMD_C( $array_double_ASM, scalar @array );
39+
},
40+
);
41+
$benchmark->add_instance(
42+
'C_SIMD_w_alloc' => sub {
43+
my $array_double_ASM = pack "d*", @array;
44+
sum_array_SIMD_C( $array_double_ASM, scalar @array );
45+
},
46+
);
47+
$benchmark->add_instance(
48+
'C_SIMD_OMP_wo_alloc' => sub {
49+
sum_array_SIMD_OMP_C( $array_double_ASM, scalar @array );
50+
},
51+
);
52+
$benchmark->add_instance(
53+
'C_SIMD_OMP_w_alloc' => sub {
54+
my $array_double_ASM = pack "d*", @array;
55+
sum_array_SIMD_OMP_C( $array_double_ASM, scalar @array );
56+
},
57+
);
58+
59+
$benchmark->add_instance(
60+
'C_OMP_wo_alloc' => sub {
61+
sum_array_OMP_C( $array_double_ASM, scalar @array );
62+
},
63+
);
64+
$benchmark->add_instance(
65+
'C_OMP_w_alloc' => sub {
66+
my $array_double_ASM = pack "d*", @array;
67+
sum_array_OMP_C( $array_double_ASM, scalar @array );
68+
},
69+
);
70+
71+
$benchmark->run_iterations(1000);
72+
73+
# Load the CSV file
74+
75+
my @data = rcsv1D( 'addArrayofIntegers.csv', { text2bad => 1, header => 1 } );
76+
77+
my %summary_stats = ();
78+
79+
foreach my $col ( 0 .. $#data ) {
80+
my $pdl = pdl( $data[$col] );
81+
my $mean = $pdl->average;
82+
my $stddev = $pdl->stdv_unbiased;
83+
my $median = $pdl->median;
84+
$summary_stats{ $data[$col]->hdr->{col_name} } =
85+
{ mean => $mean, stddev => $stddev, median => $median };
86+
}
87+
88+
# Get the column names from the first row
89+
my @column_names = sort keys %{ $summary_stats{ ( keys %summary_stats )[0] } };
90+
91+
# Define the width for each column
92+
my $width_name = 24;
93+
my $width_col = 10;
94+
95+
# Print the column names
96+
printf "%-${width_name}s", '';
97+
printf "%${width_col}s", $_ for @column_names;
98+
print "\n";
99+
100+
# Print each row
101+
foreach my $row_name ( sort keys %summary_stats ) {
102+
printf "%-${width_name}s", $row_name;
103+
printf "%${width_col}.1e", $summary_stats{$row_name}{$_} for @column_names;
104+
print "\n";
105+
}
106+
107+
unlink 'addArrayofIntegers.csv';
108+
## load the CSV file and print a summary of the results using PDL
109+
110+
__DATA__
111+
__C__
112+
#include <omp.h>
113+
114+
115+
116+
117+
void _ENV_set_num_threads() {
118+
char *num;
119+
num = getenv("OMP_NUM_THREADS");
120+
omp_set_num_threads(atoi(num));
121+
}
122+
123+
124+
double sum_array_SIMD_C(char *array_in, size_t length) {
125+
double sum = 0.0;
126+
double * array = (double *) array_in;
127+
#pragma omp simd reduction(+:sum)
128+
for (size_t i = 0; i < length; i++) {
129+
sum += array[i];
130+
}
131+
return sum;
132+
}
133+
134+
double sum_array_SIMD_OMP_C(char *array_in, size_t length) {
135+
double sum = 0.0;
136+
double * array = (double *) array_in;
137+
_ENV_set_num_threads();
138+
#pragma omp parallel for simd reduction(+:sum) schedule(static,8)
139+
for (size_t i = 0; i < length; i++) {
140+
sum += array[i];
141+
}
142+
return sum;
143+
}
144+
145+
double sum_array_OMP_C(char *array_in, size_t length) {
146+
double sum = 0.0;
147+
double * array = (double *) array_in;
148+
_ENV_set_num_threads();
149+
#pragma omp parallel for reduction(+:sum) schedule(static,8)
150+
for (size_t i = 0; i < length; i++) {
151+
sum += array[i];
152+
}
153+
return sum;
154+
}

0 commit comments

Comments
 (0)