Lower orderings #2

jeehoonkang · 2018-01-08T22:19:53Z

The correctness of this PR is explained in this RFC PR.

Amanieu · 2018-01-13T07:22:50Z

src/lib.rs

+            // argument."
+            if self.inner
+                .top
+                .compare_exchange(t, t.wrapping_add(1), Ordering::AcqRel, Ordering::Acquire)


You should change the orderings to compare_exchange(t, t.wrapping_add(1), Ordering::Release, Ordering::Relaxed) and add a fence(Ordering::Acquire) in the error path.

An acquire fence is generally slower than an acquire load (I know this is the case for ARM64 and PPC at least), however this is in the error path which is only hit in the rare case of a race.

I agree. I changed it. Thanks!

jeehoonkang · 2018-01-15T02:35:06Z

The benchmark result of rayon-demo looks.. dubious (cb: master; cb-optimized: this PR):

 name                                                                cb ns/iter             cb-optimized ns/iter   diff ns/iter   diff %  speedup 
 factorial::factorial_iterator                                       17,945,405             17,988,045                   42,640    0.24%   x 1.00 
 factorial::factorial_join                                           2,340,823              2,346,570                     5,747    0.25%   x 1.00 
 factorial::factorial_par_iter                                       2,271,869              2,267,057                    -4,812   -0.21%   x 1.00 
 factorial::factorial_recursion                                      3,151,208              3,140,834                   -10,374   -0.33%   x 1.00 
 fibonacci::fibonacci_iterative                                      19                     19                                0    0.00%   x 1.00 
 fibonacci::fibonacci_join_1_2                                       36,673,826             36,752,869                   79,043    0.22%   x 1.00 
 fibonacci::fibonacci_join_2_1                                       37,114,104             35,161,295               -1,952,809   -5.26%   x 1.06 
 fibonacci::fibonacci_recursive                                      11,120,355             11,116,372                   -3,983   -0.04%   x 1.00 
 fibonacci::fibonacci_split_iterative                                38,017                 38,002                          -15   -0.04%   x 1.00 
 fibonacci::fibonacci_split_recursive                                4,315,200              4,434,379                   119,179    2.76%   x 0.97 
 find::size1::parallel_find_common                                   7,302                  7,811                           509    6.97%   x 0.93 
 find::size1::parallel_find_first                                    4,768                  4,844                            76    1.59%   x 0.98 
 find::size1::parallel_find_last                                     2,565,199              2,484,967                   -80,232   -3.13%   x 1.03 
 find::size1::parallel_find_middle                                   1,909,305              1,864,793                   -44,512   -2.33%   x 1.02 
 find::size1::parallel_find_missing                                  3,046,770              2,968,846                   -77,924   -2.56%   x 1.03 
 find::size1::serial_find_common                                     3,484                  3,472                           -12   -0.34%   x 1.00 
 find::size1::serial_find_first                                      2                      1                                -1  -50.00%   x 2.00 
 find::size1::serial_find_last                                       4,285,578              4,245,449                   -40,129   -0.94%   x 1.01 
 find::size1::serial_find_middle                                     2,987,499              2,877,979                  -109,520   -3.67%   x 1.04 
 find::size1::serial_find_missing                                    4,418,545              4,347,340                   -71,205   -1.61%   x 1.02 
 join_microbench::increment_all                                      38,524                 39,183                          659    1.71%   x 0.98 
 join_microbench::increment_all_atomized                             1,975,323              1,938,406                   -36,917   -1.87%   x 1.02 
 join_microbench::increment_all_max                                  70,312                 68,860                       -1,452   -2.07%   x 1.02 
 join_microbench::increment_all_min                                  23,657                 23,466                         -191   -0.81%   x 1.01 
 join_microbench::increment_all_serialized                           32,172                 32,437                          265    0.82%   x 0.99 
 join_microbench::join_recursively                                   711,427                745,727                      34,300    4.82%   x 0.95 
 life::bench::generations                                            115,008,243            115,074,354                  66,111    0.06%   x 1.00 
 life::bench::parallel_generations                                   39,896,367             40,205,562                  309,195    0.77%   x 0.99 
 map_collect::i_mod_10_to_i::with_collect                            7,059,415              8,730,868                 1,671,453   23.68%   x 0.81 
 map_collect::i_mod_10_to_i::with_fold                               2,738,414              2,713,461                   -24,953   -0.91%   x 1.01 
 map_collect::i_mod_10_to_i::with_fold_vec                           3,055,922              3,379,094                   323,172   10.58%   x 0.90 
 map_collect::i_mod_10_to_i::with_linked_list_collect                15,548,849             16,623,060                1,074,211    6.91%   x 0.94 
 map_collect::i_mod_10_to_i::with_linked_list_collect_vec            8,093,351              7,914,570                  -178,781   -2.21%   x 1.02 
 map_collect::i_mod_10_to_i::with_linked_list_collect_vec_sized      7,917,735              7,716,213                  -201,522   -2.55%   x 1.03 
 map_collect::i_mod_10_to_i::with_linked_list_map_reduce_vec_sized   9,052,060              7,852,947                -1,199,113  -13.25%   x 1.15 
 map_collect::i_mod_10_to_i::with_mutex                              59,582,012             57,906,667               -1,675,345   -2.81%   x 1.03 
 map_collect::i_mod_10_to_i::with_mutex_vec                          7,755,960              7,571,576                  -184,384   -2.38%   x 1.02 
 map_collect::i_mod_10_to_i::with_vec_vec_sized                      7,792,711              9,713,066                 1,920,355   24.64%   x 0.80 
 map_collect::i_to_i::with_collect                                   33,335,645             33,124,952                 -210,693   -0.63%   x 1.01 
 map_collect::i_to_i::with_fold                                      73,229,779             73,501,195                  271,416    0.37%   x 1.00 
 map_collect::i_to_i::with_fold_vec                                  74,417,007             73,866,890                 -550,117   -0.74%   x 1.01 
 map_collect::i_to_i::with_linked_list_collect                       42,367,147             42,325,424                  -41,723   -0.10%   x 1.00 
 map_collect::i_to_i::with_linked_list_collect_vec                   41,915,355             41,256,942                 -658,413   -1.57%   x 1.02 
 map_collect::i_to_i::with_linked_list_collect_vec_sized             33,025,713             32,902,782                 -122,931   -0.37%   x 1.00 
 map_collect::i_to_i::with_linked_list_map_reduce_vec_sized          32,698,047             32,903,004                  204,957    0.63%   x 0.99 
 map_collect::i_to_i::with_mutex                                     114,234,253            113,159,786              -1,074,467   -0.94%   x 1.01 
 map_collect::i_to_i::with_mutex_vec                                 51,530,834             50,442,443               -1,088,391   -2.11%   x 1.02 
 map_collect::i_to_i::with_vec_vec_sized                             33,278,858             32,895,046                 -383,812   -1.15%   x 1.01 
 matmul::bench::bench_matmul_strassen                                4,614,061              4,673,119                    59,058    1.28%   x 0.99 
 mergesort::bench::merge_sort_par_bench                              10,509,431             10,505,307                   -4,124   -0.04%   x 1.00 
 mergesort::bench::merge_sort_seq_bench                              35,252,069             35,138,792                 -113,277   -0.32%   x 1.00 
 nbody::bench::nbody_par                                             8,662,390              8,686,335                    23,945    0.28%   x 1.00 
 nbody::bench::nbody_parreduce                                       24,753,679             24,524,192                 -229,487   -0.93%   x 1.01 
 nbody::bench::nbody_seq                                             25,579,418             25,368,741                 -210,677   -0.82%   x 1.01 
 pythagoras::euclid_faux_serial                                      34,450,535             35,424,606                  974,071    2.83%   x 0.97 
 pythagoras::euclid_parallel_full                                    50,464,949             50,542,319                   77,370    0.15%   x 1.00 
 pythagoras::euclid_parallel_one                                     9,924,467              10,646,585                  722,118    7.28%   x 0.93 
 pythagoras::euclid_parallel_outer                                   10,471,053             10,693,941                  222,888    2.13%   x 0.98 
 pythagoras::euclid_parallel_weightless                              10,670,435             10,175,918                 -494,517   -4.63%   x 1.05 
 pythagoras::euclid_serial                                           29,963,971             28,264,858               -1,699,113   -5.67%   x 1.06 
 quicksort::bench::quick_sort_par_bench                              14,334,490             14,006,571                 -327,919   -2.29%   x 1.02 
 quicksort::bench::quick_sort_seq_bench                              39,452,025             39,257,532                 -194,493   -0.49%   x 1.00 
 quicksort::bench::quick_sort_splitter                               14,540,512             14,540,958                      446    0.00%   x 1.00 
 sieve::bench::sieve_chunks                                          9,013,538              9,013,771                       233    0.00%   x 1.00 
 sieve::bench::sieve_parallel                                        3,972,418              3,936,042                   -36,376   -0.92%   x 1.01 
 sieve::bench::sieve_serial                                          14,995,405             15,370,132                  374,727    2.50%   x 0.98 
 sort::demo_merge_sort_ascending                                     176,726 (2263 MB/s)    189,258 (2113 MB/s)          12,532    7.09%   x 0.93 
 sort::demo_merge_sort_big                                           8,422,095 (759 MB/s)   8,360,897 (765 MB/s)        -61,198   -0.73%   x 1.01 
 sort::demo_merge_sort_descending                                    188,280 (2124 MB/s)    191,846 (2085 MB/s)           3,566    1.89%   x 0.98 
 sort::demo_merge_sort_mostly_ascending                              405,201 (987 MB/s)     400,984 (997 MB/s)           -4,217   -1.04%   x 1.01 
 sort::demo_merge_sort_mostly_descending                             423,101 (945 MB/s)     407,053 (982 MB/s)          -16,048   -3.79%   x 1.04 
 sort::demo_merge_sort_random                                        1,289,835 (310 MB/s)   1,258,557 (317 MB/s)        -31,278   -2.42%   x 1.02 
 sort::demo_merge_sort_strings                                       4,345,145 (184 MB/s)   4,293,042 (186 MB/s)        -52,103   -1.20%   x 1.01 
 sort::demo_quick_sort_big                                           7,034,864 (909 MB/s)   7,265,116 (880 MB/s)        230,252    3.27%   x 0.97 
 sort::demo_quick_sort_mostly_ascending                              15,507,888 (25 MB/s)   14,369,374 (27 MB/s)     -1,138,514   -7.34%   x 1.08 
 sort::demo_quick_sort_mostly_descending                             13,116,593 (30 MB/s)   13,358,592 (29 MB/s)        241,999    1.84%   x 0.98 
 sort::demo_quick_sort_random                                        1,292,698 (309 MB/s)   1,321,172 (302 MB/s)         28,474    2.20%   x 0.98 
 sort::demo_quick_sort_strings                                       5,508,654 (145 MB/s)   5,496,890 (145 MB/s)        -11,764   -0.21%   x 1.00 
 sort::par_sort_ascending                                            53,900 (7421 MB/s)     58,042 (6891 MB/s)            4,142    7.68%   x 0.93 
 sort::par_sort_big                                                  8,561,706 (747 MB/s)   7,727,520 (828 MB/s)       -834,186   -9.74%   x 1.11 
 sort::par_sort_descending                                           99,631 (4014 MB/s)     90,953 (4397 MB/s)           -8,678   -8.71%   x 1.10 
 sort::par_sort_expensive                                            48,209,678 (8 MB/s)    48,207,970 (8 MB/s)          -1,708   -0.00%   x 1.00 
 sort::par_sort_mostly_ascending                                     431,893 (926 MB/s)     428,563 (933 MB/s)           -3,330   -0.77%   x 1.01 
 sort::par_sort_mostly_descending                                    473,919 (844 MB/s)     467,101 (856 MB/s)           -6,818   -1.44%   x 1.01 
 sort::par_sort_random                                               973,573 (410 MB/s)     941,840 (424 MB/s)          -31,733   -3.26%   x 1.03 
 sort::par_sort_strings                                              3,362,282 (237 MB/s)   3,365,980 (237 MB/s)          3,698    0.11%   x 1.00 
 sort::par_sort_unstable_ascending                                   41,634 (9607 MB/s)     41,455 (9649 MB/s)             -179   -0.43%   x 1.00 
 sort::par_sort_unstable_big                                         5,593,860 (1144 MB/s)  6,229,215 (1027 MB/s)       635,355   11.36%   x 0.90 
 sort::par_sort_unstable_descending                                  60,589 (6601 MB/s)     61,245 (6531 MB/s)              656    1.08%   x 0.99 
 sort::par_sort_unstable_expensive                                   61,572,820 (6 MB/s)    58,630,374 (6 MB/s)      -2,942,446   -4.78%   x 1.05 
 sort::par_sort_unstable_mostly_ascending                            250,945 (1593 MB/s)    249,960 (1600 MB/s)            -985   -0.39%   x 1.00 
 sort::par_sort_unstable_mostly_descending                           262,182 (1525 MB/s)    262,447 (1524 MB/s)             265    0.10%   x 1.00 
 sort::par_sort_unstable_random                                      527,763 (757 MB/s)     536,725 (745 MB/s)            8,962    1.70%   x 0.98 
 sort::par_sort_unstable_strings                                     4,758,238 (168 MB/s)   4,831,737 (165 MB/s)         73,499    1.54%   x 0.98 
 str_split::parallel_space_char                                      1,174,103              1,157,277                   -16,826   -1.43%   x 1.01 
 str_split::parallel_space_fn                                        914,230                942,340                      28,110    3.07%   x 0.97 
 str_split::serial_space_char                                        2,783,193              2,721,194                   -61,999   -2.23%   x 1.02 
 str_split::serial_space_fn                                          1,972,265              1,905,675                   -66,590   -3.38%   x 1.03 
 str_split::serial_space_str                                         2,655,604              2,636,902                   -18,702   -0.70%   x 1.01 
 tsp::bench::dj10                                                    11,654,227             11,664,333                   10,106    0.09%   x 1.00 
 vec_collect::vec_i::with_collect                                    3,218,124              3,624,974                   406,850   12.64%   x 0.89 
 vec_collect::vec_i::with_collect_into                               3,585,083              3,540,121                   -44,962   -1.25%   x 1.01 
 vec_collect::vec_i::with_collect_into_reused                        1,905,820              1,956,071                    50,251    2.64%   x 0.97 
 vec_collect::vec_i::with_fold                                       37,789,729             37,839,956                   50,227    0.13%   x 1.00 
 vec_collect::vec_i::with_linked_list_collect_vec                    28,705,893             28,431,906                 -273,987   -0.95%   x 1.01 
 vec_collect::vec_i::with_linked_list_collect_vec_sized              24,432,471             24,289,628                 -142,843   -0.58%   x 1.01 
 vec_collect::vec_i::with_linked_list_map_reduce_vec_sized           18,796,286             19,068,593                  272,307    1.45%   x 0.99 
 vec_collect::vec_i::with_vec_vec_sized                              19,021,448             19,080,272                   58,824    0.31%   x 1.00 
 vec_collect::vec_i_filtered::with_collect                           22,184,963             22,447,459                  262,496    1.18%   x 0.99 
 vec_collect::vec_i_filtered::with_fold                              40,534,102             40,793,224                  259,122    0.64%   x 0.99 
 vec_collect::vec_i_filtered::with_linked_list_collect_vec           31,042,944             30,971,337                  -71,607   -0.23%   x 1.00 
 vec_collect::vec_i_filtered::with_linked_list_collect_vec_sized     26,308,873             26,475,456                  166,583    0.63%   x 0.99 
 vec_collect::vec_i_filtered::with_linked_list_map_reduce_vec_sized  22,552,786             22,751,778                  198,992    0.88%   x 0.99 
 vec_collect::vec_i_filtered::with_vec_vec_sized                     22,354,758             22,399,421                   44,663    0.20%   x 1.00

jeehoonkang · 2018-01-18T05:52:21Z

Oh, it's obvious that the performance doesn't improve much: different orderings doesn't matter in x86 compilation results. In particular, CAS(SeqCst, Relaxed) = CAS(Release, Relaxed) when compiled down to x86. The only possible difference is replacing a release-swap with a release-store (https://github.com/crossbeam-rs/crossbeam-deque/pull/2/files#diff-b4aea3e418ccdb71239b96952d9cddb6R222), but it happens rarely.

In order to assess the performance improvement, maybe we need to go to POWER or ARM..

ghost · 2018-11-05T01:05:57Z

What should we do with this PR? Resolve conflicts and resubmit in the main repo? Close?

Lower orderings

baf8955

jeehoonkang mentioned this pull request Jan 8, 2018

Prove the correctness of a work-stealing deque crossbeam-rs/rfcs#26

Open

Amanieu reviewed Jan 13, 2018

View reviewed changes

Issue acquire fence in pop()'s race case

65f3a99

ghost force-pushed the master branch 3 times, most recently from 44d755b to df1ac52 Compare June 28, 2018 21:21

jeehoonkang closed this by deleting the head repository May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower orderings #2

Lower orderings #2

jeehoonkang commented Jan 8, 2018 •

edited

Loading

Amanieu Jan 13, 2018

jeehoonkang Jan 13, 2018

jeehoonkang commented Jan 15, 2018 •

edited

Loading

jeehoonkang commented Jan 18, 2018

ghost commented Nov 5, 2018

Lower orderings #2

Lower orderings #2

Conversation

jeehoonkang commented Jan 8, 2018 • edited Loading

Amanieu Jan 13, 2018

Choose a reason for hiding this comment

jeehoonkang Jan 13, 2018

Choose a reason for hiding this comment

jeehoonkang commented Jan 15, 2018 • edited Loading

jeehoonkang commented Jan 18, 2018

ghost commented Nov 5, 2018

jeehoonkang commented Jan 8, 2018 •

edited

Loading

jeehoonkang commented Jan 15, 2018 •

edited

Loading