SPI performance on mono SSH1106 128x64 screen - bulk SPI transfer #970
Replies: 28 comments
-
Posted at 2018-02-26 by @gfwilliams Hi - are you sure it's the transfer speed, or is it actually updating the graphics itself that takes the time? You could make sure you're on version 1v96 or later as I did some stuff to improve graphics speed on that. SPI transfer speed itself is still pretty reasonable. The bug you point to was really for ESP8266 because the API they provided for SPI was slow for single bytes - on STM32 especially the SPI is reasonably fast - it'll push out 4Mbps if I recall, so is unlikely to be the issue with your display. You can increase the SPI clock rate really easily when you set SPI up, so that's a really easy boost: http://www.espruino.com/Reference#l_SPI_setup Default is 100k, but you could push that up to 1M pretty easily or the display may take 4M. This is the driver for the display: http://www.espruino.com/modules/SH1106.js It does send data in chunks, but IMO it could be faster. You could try adding something like this after initialising the graphics and see if that makes a difference?
|
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-26 by jugglingcats Hi Gordon, thanks for the reply. I'll give those things a try and report back. Perhaps the ESP8266 issue is still present for ESP32, because I ran up the same code on my ESP32 dev board and it was noticeably slower than the Pico, but I haven't looked into whether this is a simple SPI clock frequency problem... |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-26 by @gfwilliams Ahh, it could well be an ESP32 SPI issue in your case - the folks working on it are doing a great job (especially as they're just doing it for fun), but I think the focus is still on getting things supported and bug-free, and not on outright speed. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-26 by jugglingcats Am focused on the Pico, trying to see what I can achieve with this little OLED screen. I updated the firmware - no noticeable difference. I already had the baud option set very high. I forked and updated the driver and made your suggested code fix and... it's definitely better! Maybe 30% faster. Can it can go faster or is this is the most we can expect? I assume the display is keeping up otherwise the signal would be corrupted? Is there a simple way to tell if graphics is the bottleneck? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-27 by Wilberforce
The esp32 esp-idf provides a function to write x bytes, but the espruino code calls it in a loop a single byte at a time, so this would have to be rewritten to support mutli byte sends. I'm sure it would be faster then! |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-27 by jugglingcats I had a very very brief look today with a logic analyser and there do appear to be gaps in the traffic, but I need to look a bit closer and provide some proper evidence back. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-01 by jugglingcats Spent some time this evening and there's no real issue with the Pico SPI performance. I can get screen repaint (at least the SPI portion) down to 20ms at 800k baud. Limit seems to be around 10ms. Attached is a quick write-up. Would be good to understand if/why Graphics class is holding things up. I will look at reproducing with the ESP32!Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-02 by @gfwilliams Nice - thanks! So you're finding that at 4Mbps it's not able to saturate the link? I think to top that we'd probably going to have to look at DMA - but 20ms update isn't bad, especially since the LCDs usually blur when you update them much above 20fps. You might be able to squeeze a bit more out of the 'flip' routine by unrolling the loop and doing some things like that though - I imagine there are still gaps from the JS execution speed. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-02 by jugglingcats 20ms is definitely ok provided the Javascript isn't adding too much. I'll do some more playing around. At 4Mbps it seemed to me there were two issues:
|
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-02 by jugglingcats I played around with the ESP32 for interest. One really obvious thing is that the baud setting doesn't seem to have any effect. It seems pinned at 100khz. The other is that there is a big gap between each byte even at this low frequency which I guess relates to @wilberforce comment above. See pic.Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-03 by Wilberforce The Bitrate should be getting set: https://github.com/espruino/Espruino/blob/master/targets/esp32/jshardwareSpi.c#L120 https://esp-idf.readthedocs.io/en/v2.0/api/peripherals/spi_master.html |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-03 by jugglingcats Yes I looked at that code too so not sure what is wrong. I might try writing a low level esp-idf app and check I can get higher speeds out of it. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-03 by Wilberforce I'm wondering if once the speed is set, it stays the same. Perhaps try setting to 400000 after a fresh boot? |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-05 by @gfwilliams It'd be interesting to try software SPI on ESP32. It'd almost certainly beat 100kHz and the time between bytes will probably be shorter too. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-05 by jugglingcats Hi @wilberforce I tried your suggestion. I set it to 1Mhz after clean boot and the clock frequency is now correct but there are HUGE gaps between bytes. See pics -- the first shows 1Mhz for clock pulses and the second shows some bytes sent with the big gaps between.Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-05 by jugglingcats Hi @gordon tried your suggestion too and can get it up to around 700khz on the clock and the bytes sent for each scan line are looking good but there are big gaps between while the control bytes are sent... see pic (single complete page redraw all 0xFF). The good news is that a total screen redraw is around 40ms so framerate around 25. Not so bad. Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-05 by jugglingcats @wilberforce perhaps this give a clue about the HW SPI performance issues on ESP32...? |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-05 by jugglingcats I hacked in spi_device_queue_trans instead of spi_device_transmit. I got some unpredictable results at slower speeds but at 3Mhz and 4Mhz it was stable, and improved speed by about 2x. There are still big gaps though. Attached are pics of two captures. Both are at with 4Mhz clock requested. The slower one is the current code and the faster one is using spi_device_queue_trans. Like I said I hacked it in so I don't think you could use this to send and receive data -- only send. The faster capture shows roughly the same xfer (40ms) as software SPI so should be possible to improve on it. I think ultimately the Espruino calling code should be changed to support multi-byte transfer. Faster updates should then be possible, thus providing more time between updates for Espruino/game code. For some scenarios the fact that spi_device_queue_trans is non-blocking might be useful - not sure it is for regular apps though as you generally want to wait for flip() to complete. Anyway hope this is useful / of interest!Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-06 by @gfwilliams Thanks - yes! Multi-byte transfer isn't as easy as just refactoring to use the call, as memory isn't guaranteed to be in one flat area. If you have to allocate a flat area of memory and then copy data into it then it'll actually take longer on most platforms - it's just that the time will all be concentrated before the transmission starts rather than inbetween bytes. Realistically we'd be best off going straight for a function that sent SPI via DMA. We could then boost the standard If graphics drivers could do that then the next chunk of data could be prepared while the current one was sending. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-06 by jugglingcats Hi @gordon, a DMA option sounds good longer term. I read a little about it in the ESP32 docs. You're recommended to allocate the memory for the SPI transfers with Where would I find equivalent info for the STM32 / Espruino? I'm keen to do a little hacking and would prefer to start with the Pico. It occurs to me it would be dangerous to return from the flip method while the transfer is still happening, because then the app might start modifying the memory that's still being written. Thanks |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-07 by Wilberforce For the esp32 case it looks like this flag should be set: https://esp-idf.readthedocs.io/en/v2.0/api/peripherals/spi_master.html
|
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-07 by @gfwilliams In the case of flip it should be reasonably easy - make it run in the background, but block if another In terms of STM32 support, I'd look at the Then there's nRF52 (and also STM32LL) as well - I believe there are some drivers that handle DMA'd SPI. Last time I looked there was a bug that meant that 1 byte sends failed, but in the newer SDKs I'm using now that's probably fixed. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-09 by jugglingcats I've experimented a little with the ESP32 using a custom Espruino class. I picked the ESP32 mainly because the DMA support is so much simpler. You just alloc some memory with the right flags and it handles the rest. I did look at doing this on STM32 but it's quite a bit more involved. The main challenge was understanding the Cut a long story short, I managed to send 1024 bytes in just over 1ms... see pic. This is running the clock at 20Mhz. I'm not sending any of the command data at the moment so that will add a bit more delay -- it will double the number of transactions. Some of the remaining overhead might be due to the mutex locks in ESP-IDF mentioned in the linked post above. The only way to remove this completely would be to bypass this layer and go direct. I notice that the SD1780 display (and possibly SH1106 -- haven't checked) support both horizontal and vertical addressing where the pages will automatically wrap, which should mean no control data between data transactions -- and possibly being able to send all the data in a single transaction, which would be super-quick. Will try that next. Not quite sure why I'm so obsessed with maximum speed but it's a fun journey... I have no idea how the existing layers in Espruino could be refactored to take advantage of bulk send/DMA. It's clear to me that on ESP32 at least the graphics buffer should be allocated in co-operation with the SPI sending code (ie. the display driver). I got a bit lost following the code under ThanksAttachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-10 by @allObjects Yes, in Espruino common memory management is done with chained blocks to avoid some of the gc challenges that come otherwise and to support interpretation from source code as Espruino does. Most recently though, @gordon has introduced the options of allocating memory as contiguous string of bytes. I'm though not familiar with the requirements to make that available for the DMA for bulk sending as you are looking for. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-10 by jugglingcats Thanks @allObjects, your post gave me a clue to look at how the graphics array is allocated and it is in a flat data area, so I now use a simple I now have a little Javascript app drawing and moving 16 rects and total execution time is around 35ms, ie. 28fps. Even this was interesting as I found things like Breadboard pic for interest...Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-10 by @gfwilliams On most platforms you could just use the arraybuffer from graphics (it'll be allocated as a flat string of data if possible). There's just code in there to deal with the case where it can't be and has to be fragmented. To be honest if you could write the SPI data out in one big buffer then you could change the JS driver, and with software SPI you'd still be looking at a pretty fast refresh speed. Faster than the display can handle anyway. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-10 by Wilberforce In the esp32 case you mentioned allocating memory in a certain way to use for Dma. Is there any downside to this, as the arraybuffer class in the esp32 case could allocate this type of continuous block, and then you would not need to the memcpy at all as the source would already be of the correct type. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-03-11 by jugglingcats Hi @wilberforce I don't know to be honest, but I would assume it's not ok to allocate all memory blocks as DMA capable, ie. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-26 by jugglingcats
Hi, sorry if this has been covered before. I have a SH1106 LED device attached to Espruino and it's doing buffered graphics just fine but the performance is not stellar.
Haven't timed it precisely but looks to take around 50ms to update the screen which gives 20fps.
I know a little bit about SPI and can see some discussion in issues and elsewhere about the fact that Espruino sends a byte at a time, but could be bulk sending data to avoid the handshake overhead. I also noted the comments at espruino/Espruino#695 that memory is a constraint.
Is it the case that the Espruino can't spare 1k of data for the SPI bulk transfer (on a mono 64x128 screen)? Or would it not help if the transfer was chunked in to pieces, even if they were only eg. 128 bytes?
Thanks for a great project.
Alfie.
Beta Was this translation helpful? Give feedback.
All reactions