Espruino Benchmarking #7710
Replies: 3 comments 5 replies
-
Hi! Thanks for such detailed benchmarks! We do have a few benchmarks in https://github.com/espruino/Espruino/tree/master/benchmark and it might be interesting to include your more specifically-tailored ones in there? For a while I did run benchmarks on real hardware for every commit, but it became a bit too difficult to manage - it would be nice to have something similar again though (I have planned for a while to have a hardware testbench that would automatically test and benchmark each device, but it's been hard to get time for that) Did you test the code when it was minified? That can have a big impact - see https://www.espruino.com/Performance
I've posted on the issue you created, espruino/Espruino#2615 - but this is just because you didn't build with Did you find build instructions somewhere that you followed? Because I had tried to include it in https://github.com/espruino/Espruino/blob/master/README_Building.md
I think mainly because you built non-release for the S3 so it had all the assert checks in it... Also the C3 has a totally different RISC-V processor if you tried that?
Part of it is architecture - the ESP32 has program code in external SPI flash so despite the high clock speed it has to drag the code it executes out of that. Bangle.js is slower than Pixl.js because you'd have written the benchmark code to the external SPI flash, and so it was having to execute from there. If you tag a function as But mainly the ESP32 port is done by a few very motivated individuals, but I don't make any money out of it (apart from the odd Patreon supporter) and Espressif themselves provide zero support so I don't spend much time on it. As a result, while every so often I spend a reasonable amount of time running benchmarks on the boards I sell and making optimisations, that doesn't happen for ESP32. It's a shame - but it just doesn't make sense for me to spend my time, unpaid, optimising ESP32 so it can then steal market share from the devices that I sell :) I did contact Espressif a few times to see if they could support some work on it at all but they're just not interested. It's likely that it wouldn't take that long to figure out what functions are called most often and then tag them with a declaration that would tell the ESP32 they need to stay in RAM and it'd help significantly with speed though
I don't know - what do you want to do? :) |
Beta Was this translation helpful? Give feedback.
-
Hi Gordon, thanks for your useful insights and all you do. I'll take a look at the benchmark capability you linked to. I completely understand your funding and support model. That's why despite using the ESP32 primarily (lots of RAM being particularly important to my use case), I'm a Patreon subscriber and have a few official boards to play with too. I really should have thought to minify the code. I minify all code in other circumstances. Failing to do so here is an undesirable deviation from how I use Espruino. I wasn't looking for absolute peak performance numbers, more a means of fair comparison. I didn't want to use I do have a C3 but I haven't tried it yet. |
Beta Was this translation helpful? Give feedback.
-
Maybe not quite on the subject, but hopefully relevant : |
Beta Was this translation helpful? Give feedback.
-
I have had a need to test performance of different boards and Espruino versions.
Files
espruino_benchmarks.xlsx
bench.js.txt (Remove .txt from filename, it seems Github doesn't allow sharing *.js files in discussions).
Method
If anyone spots any flaws in my method, please point it out to me.
For all boards except the ESP32-S3, I used the stock firmware images available at https://www.espruino.com/Download
For the ESP32-S3, I compiled the image myself. I did not attempt to create old versions so only one set of tests was conducted for the S3.
For every board, the attached
bench.js
was uploaded to the boards Storage.The results were always collected with:
If the board executed the code during upload, these numbers were ignored. The output results that look like:
The first test (benchmark 1) is running
() => {}
, the results of which are subtracted from all subsequent except the final GC Test, so that we can remove the overhead of running the benchmark from the results.Subsequent benchmarks are in the following groups: basic math functions, logic, flow control, string operations, bitwise operations, casting, memory assignment and garbage collection. Control cannot be tested without the use of logic, so the logic results are subtracted from the initial control values to produce a logic-only result.
The final test creates a lot of variables and then collapses the scope they're in. It is a long running test and unlike other tests is run only once. The raw result is used for comparison. Although my intention with this test is to assess the performance of garbage collection, there is a lot going on here. It could equally be seen as a test of the call stack, and there are control, logic, math and string concatenation all in there too. So, really is an all rounder.
Results
Please find attached an Excel spreadsheet.
The ESP32-S3 failed to complete the GC Test, complaining of a failed assert at line 835 of JsVar.c. I will create an issue after I have finished this write up. (EDIT: See espruino/Espruino#2615)
ESP32 and ESP32-S3 results have been found to be so much slower than the other boards that for most tests, it becomes difficult to see the details of the results of the official boards when displayed on the same charts. They are almost literally incomparable. To makes sure there is always a chart that shows any relevant detail, I have where necessary produced charts comparing all boards, charts comparing official boards only and charts comparing Espressif boards only.
Math
The results for the official boards are quite consistent across Espruino versions.
The claimed superiority of the S3 is definitely not visible here.
The ESP32 results vary quite a lot across Espruino versions.
Logic
Logic operations are unsurprisingly faster than Math operations.
The S3 is again slower than its older sibling.
There is again significant variation between Espruino versions on the ESP32 while official boards are quiet consistent.
Flow Control
Flow control operations are the fastest operations on any board and with any Espruino version.
Anyone using the common optimisation technique of using math operations instead of flow control to avoid cache invalidation may well be slowing down their code. I think that's an optimisation for compiled code rather than interpreted code, so fits with my expectations but I didn't specifically test this.
Previously observed trends continue.
String Operations
String operations are as I'd expect, the slowest operations across all boards.
Previously observed trends continue. The ESP32 has its worst performance relative to other boards on this test.
Bitwise
Previously observed trends continue.
Casting
Previously observed trends continue.
Memory Assignment
Previously observed trends continue.
Garbage Collection
This is a long running test the takes up the majority of the testing time. Interestingly, the ESP32 doesn't fair as bad in this test as the others. While it is still slowest, it isn't as bad proportionatly. Maybe having so much more memory than the other boards allows it to do relatively few GC operations (even if when it does them, it will find more free-able memory).
Overall Performance
And finally the over all results. These results in particular should be treated with a certain amount of caution. This chart simply shows the time it took to complete the entire set of tests. Different applications will use different functionality to differing degrees, so these numbers will never be truely representative. This chart makes no attempt to weigh the individual results proportionate to any kind of real world usage. Also note that on all boards the majority of these numbers is taken up by the GC Test.
The 100Mhz STM32 based Espruino WiFi blews everything else away.
The 64Mhz NRF52 based Pixl.js and Bangle 1 put in a respectable performance not far behind, with the Pixl.js being the faster of the two.
The 240Mhz ESP32 was surprisingly sluggish. I actually wrote a custom extension so that I could check the clock speed. I don't know if perhaps the Flash storage on the ESP32 is more of a bottleneck or if the Xtensa processor just manages fewer instructions per cycle than its competitors, but ESP32 performance was frankly terrible in comparison to the others in all tests, and the ESP32-S3 is even worse.
Summary
The STM32 based Espruino WiFi is by far the fastest board in this test and NRF52 based boards are comparable clock-cycle for clock cycle.
The Espressif boards despite having the highest clock speeds are the slowest in all tasks. The S3 is surprisingly slow. But I am fully aware they have a lot more memory and I/O capability which will be more important than CPU performance in a lot of use cases.
Espruino performs fairly consistently between versions on official boards but quite variably on the ESP32.
Over all, Flow control operations are faster than
logic operations which are faster than
bitwise operations which are faster than
Mathmatical operations which are faster than
Casting operations.
Questions Raised
Why is performance so variable between Espruino releases for the ESP32?
Why is the ESP32 so slow compared to the others? Why is the ESP32-S3 so slow compared even to the ESP32-S3?
Can I do what I want on an STM32 based board?!
Beta Was this translation helpful? Give feedback.
All reactions