meaninglessBenchmarks.txt

These benchmarks are meaningless, as between computers with multicore
capabilities, different speeds and instructions sets, compilers, and on
it really doesn't matter.  So here's mine.

OSX Yosemite
2.66 GHz Intel Core 2 Duo, 4GB RAM (It's from 2009, don't make fun of me)
Limit: 2^25 = 33554432 
--------------------------------------------------------------------------------
        Basic Algorithm    2wheel(odds only)    23wheel      235wheel
Ruby             16.79s                          4.829s
Python           12.56s                3.88s                   2.248s
Java              1.13s               0.563s                   0.370s
Go                1.13s               0.551s                   0.414s
D                 1.11s               0.537s                   0.478s
C                 1.30s               0.579s                   0.411s
C(-O3)            1.09s               0.524s                   0.346s
Rust 1.2.0        1.12s               0.532s                   0.341s
--------------------------------------------------------------------------------         

Linux Mint 17.1
Intel Core i7-2760QM 2.4GHZ, 8GB RAM
--------------------------------------------------------------------------------
        Basic Algorithm    2wheel(odds only)    23wheel      235wheel
Java              0.76s               0.167s                   0.105s
Go                0.52s               0.180s                   0.115s
D                 0.44s               0.164s                   0.102s
C(-O3)            0.42s               0.163s                   0.093s
--------------------------------------------------------------------------------


So for version 2.0 of this test, it shows that it pays to compile your language.
Really it is somewhat unfair, as there is no way that ruby or python could compete
with something that could optimize away some of the loop structures needed for
the wheel classes.

Apparently ruby sucks at doing this kind of number crunching, since it's 23 wheel
can't beat python's odds-only wheel.  Probably too many complications, making it 
too difficult.  It's also really weird, since it's almost a wheel23, but doesn't
let the outermost loop ignore multiples of 3.  Strange in my book.  This was also
my first implementation, so it holds a special place in my heart.

Python's performance is quite a bit better than Ruby's.  It gies about a 33% increse 
in the throughput, and the odds only very beats the 23 wheel of Ruby's.  Comparing 
the basic algorithm with the 235 wheel optimization is over a 300% speedup (322.5% or so).
The 235 wheel is also much more compilcated to code, as it requires a lot of special 
attention to very subtle tricks.

So far we have only talked about interpreted languages.  Now we move onto the 
languages that were built for this sort of problem, compiled and optimized.  
We have our algorithms in hand, so we should try one of the more forgiving compiled
languages.  I chose Go.

Coding the entire thing in go went better than I though, although there was some
cross eyes when I was converting between float64, int64, uint64, you get the 
picture.  Go is strongly typed, so these conversions are necesssary for the 
compiler to know what is going on, and allow the compile to decide what a lot
of the variable types are at runtime.  That being said, there were lots of problems
converting the two complex algorithms into Go.  However, the results were spectacular!
We see a 12x performance increase at the basic level, closer to 3x for the odds-only
implementation, and we see around a 5.5x boost in performance on the 235 wheel.  Whoa.

Still talking about Go, there are some things that just can't be beat. The type 
inference is a saver (as we will see when we get to C).  I could probably make this 
even faster by declaring the arrays as static variables (which I will get around to)
and a couple of other things.

C.  This is the one that I wanted to use as a golden standard for the others to be
measured by.  I was incredibly impressed that Go could keep up with (and beat) C
in some cases.  I also learned a valuable lesson in implicit vs explicit volatile
declaration of variables.  The counters that keep track at the end of how many
primes we've found, and the max one, started out as implicitly non-volatile.
This was very bad when I went to try the -O3 optimization, because suddenly I was
reading garbage into my counter value, breaking the project.  By declaring
them as volatile, I saw a performance hit (duh), but I was getting the right numbers.

For now, hands down, the fastest language is C.  But again, I was very impressed
with how Go held up again it.  I might try pushing the compiled languages to higher
limits and checking the speeds, but that could wind up breaking a lot of things as
we get cloesr to the 32bit barrier.

In version 2 of the test, I fixed a bug that led to slight performance improvements
across the board.  I also added a full Rust set.  The new king is Rust, which runs
slightly faster than C, which I found interesting.  The difference is miniscule,
so we would have to move up to the 32bit barrier to see if it splits the difference
any better.

Here are the results for the two compiled languages at the gigabyte limit.
Limit: 2^30 = 1GB 
--------------------------------------------------------------------------------
        Basic Algorithm    2wheel(odds only)    23wheel      235wheel
Java              61.9s                21.7s                    15.5s
Go                52.9s                21.9s                    17.2s
D                 48.9s                21.3s                    19.6s
C(-O3)            48.2s                21.2s                    15.2s 
--------------------------------------------------------------------------------
Java              29.6s                6.53s                    4.87s
Go                20.3s                6.94s                    5.52s
D                 17.2s                6.47s                    4.98s
C(-O3)            16.6s                6.36s                    4.77s
--------------------------------------------------------------------------------

So they still track together.  I think that this is about the extent that the exercise
can be taken to, as at this point we're just adding more languages, and I am not
going to work out the 2357wheel for anything.  About the only other thing that I would
add is Rust when they get a Beta version working.  C++ should track almost exactly
as C.  It really is cool to see that Go works that well.

Going back, I used the C implementation to write a fast Java implementation.  I was
very impressed with the results of Java, as it was basically as fast as the fully
compiled languages.  Also, I added in D because I could.  The results tracked with
the others, since the implementation doesn't change.

Fun though.  I'm looking forward to moving to a different machine and seeing the results
there.