Outdoor refresh rate

Hans · June 3, 2021, 5:58pm

First of all, thanks for all the previous work, I’m now able to use outdoor 64x32 mod8scan panels!

I’m using shield V4 with Teensy 3.6, maximum is 3 panels (64x96). The panels are part of a score-display, from time to time spectators want to make a picture or short video. For the human eye everything looks good but outdoor the average shutter speed of a camera is too fast,
partly lines appear.

Only with a very low shutter speed I’m able to make a complete image.

I tried different setting with setRefreshRate, 100 seems to be the maximum for my setup but it appears not different. Are there other options I can try? Many thanks.

sutaburosu · June 3, 2021, 7:17pm

From the looks of your first photo, you’d need at least a 10x increase in refresh rate to see a full image on the matrix at the same shutter speed.

There is a trade-off between refresh rate, brightness and colour depth. I’m not sure about Teensy 3.6, but on T4 you can set the colour depth in 3-bit increments. You should be able to increase the refresh rate significantly by dropping to 12-bit colour. See kRefreshDepth in the example sketches.

I’m not sure the T3.6 has the speed to work with crazy fast shutter speeds. As a reference point, on a 128x64 matrix my T4 gets 280Hz with 36-bit refresh, and 900Hz@12-bit.

When experimenting with this, it can be useful to regularly print matrix.getRefreshRate() so you can check that SmartMatrix isn’t throttling back the refresh rate.

Hans · June 3, 2021, 7:52pm

Thanks for the reply, I wasn’t able of testing T4 but I will go for the V5 with the T4.

sutaburosu · June 3, 2021, 7:59pm

I don’t want to get your hopes up too high. I don’t have a camera capable of testing this. Is it right that in “sports mode”, under bright light, the shutter could be open for only 1/8000th second? Even 900Hz refresh would end up looking similar to your first photo on that time scale.

Hans · June 3, 2021, 8:06pm

I was already on the limits of the T3.6 so swapping must improve something

Do you know if RPi have better results?

sutaburosu · June 3, 2021, 8:11pm

I’ve never used it, but I’ve heard good things about Henner Zeller’s rpi-rgb-led-matrix. The readme mentions ~100Hz refresh with 96 chained panels, so that may mean closer to 3kHz refresh for only 3 panels. I don’t know.

Hans · June 3, 2021, 8:33pm

With the panels I purchased a Hanson RPi-MFC but my knowledge with the RPi is minimum. Now my score display is receiving data over Xbee so I probable need to redesign the hole setup then. Thanks for the link.

Louis · June 4, 2021, 9:20am

I agree with @sutaburosu’s advice, I suggest trying to lower the color depth first before switching platforms. On the Teensy 3 you can also set color depth in multiples of 3. Try lowering the depth and increasing the refresh rate until it’s able to sustain the refresh rate you like, and see if the number of colors available at that depth are enough.

Hans · June 4, 2021, 10:34am

I’m very happy with how everything is combined right now so switching to a different platform is my last option

I checked the refresh rate, the maximum I can get is 120, lowering the kRefreshDepth further then 27 didn’t make a difference.

Changed the COLOR_DEPTH 24 to 12 but that gave a compiling error. I think it is caused by kPanelType = SM_PANELTYPE_HUB75_32ROW_64COL_MOD8SCAN. Edit: I think I didn’t understand this correct, COLOR_DEPTH 24 remains and adjusting goes through kRefreshDepth.

Can you give a hint what to try next?

Edit 2: With some more testing I was able to reach 180 with kRefreshDepth 9. That seems to be the maximum, unfortunately not good enough. Hope the T4 gives better results

dthacher · June 4, 2021, 11:58am

What is the max frequency you get with SmartMatrix on the different controllers? I want to say the max for the Pi is somewhere around 12-18MHz. I think I saw 12-14 on Pi 2, but I do not recall exactly.

You trade pixels, bit depth and FPS. However there could be power supply issue depending on certain factors. For example 32x16 panel with 4 BCM bits of depth could get around 2929 FPS assuming 12MHz. PSU could see 0.9-1.8A at 46.8kHz per HUB75. This is kind of aggressive. With 8 bits BCM the FPS drops to 183 and the PSU drops to 5.9kHz, which is more reasonable. PWM (aka naïve-PWM) makes this even more stable. Longer chains increase the load but lowers the frequency until multiplexing breaks down, which would be around 800Hz for this.

Does this sound reasonable to you?

Hans · June 4, 2021, 12:06pm

I hope this question is for Louis, its a bit above my knowledge

dthacher · June 4, 2021, 12:13pm

It is. I am not sure the exact answer myself. However I suspect there is a worst case condition if you make square wave using BCM bits. Generally speaking it never matters. However it can cause issues. I am curious if Louis knows or has experimented with this.

My thinking is FPS times Multiplex times BCM bits divided by 2 represents worst frequency for PSU. Standard PWM is BCM bits divided by two times better, however is computationally very hard. For most of the use cases this never matters for BCM. This avoids need for expensive hardware and other stuff.

I also have reasons to suspect another trade off exists in multiplex and bits of depth in terms of current/brightness division for LEDs. Regardless of CIE1931 or perceptive brightness. This would not hold for outdoor panels of course which have more than higher density panels using more multiplexing.

Edit: Outdoor panels generally force longer chains which improves the PSU frequency. For the same number of pixels you increase the columns rather than the multiplex. The lower multiplex improves current/brightness per pixel. This does increase the load but lowers the frequency on the PSU.

In a more realistic usage the frequency will not be this extreme. However it can be. Bulk and/or decoupling caps can be used to average the power draw into a larger lower frequency load for PSU. This is again outside my wheelhouse but I know some consideration is required.

marcmerlin · June 21, 2021, 6:32pm

In my testing, you need 400-500Hz referesh rate for pictures to look ok with most cameras.
My own big array is only 100Hz because it has 100k pixels, and I have to use a special camera with special settings to get video without refresh bars.
See

vs

Same refresh rate, but second one uses an RX100M7 with manual settings I set to use a slower frame rate on the camera.

cell phones in daylight are tough because they might take a shot in 0.001s because they have enough light, honestly outside of running with an FPGA at 1000Hz, there may not be an answer there. Thankfully my stuff is mostly used at night where cameras automatically use longer exposures and things work out on their own.

dthacher · June 24, 2021, 2:45am

It may not need to be clocked that high. However one may be able to get with 33MHz. Max quality would be about 10 bit PWM for 1kHz on 1:32 multiplex. (Rough guess) Less quality will enable high refresh at lower clock. Possible without FPGA if careful.

This is problematic on the Pi. May work on Teensy till memory runs out. FPGA has different tricks it can use. For CC LED drivers its pointless outside increasing bandwidth. Which comes with other problems.

In theory, PWM or MM LED driver could get what you are looking for. I am not about to repeat myself. There is a lot that can be done to get the refresh rates higher. Certain platforms are more capable than others.

dthacher · June 24, 2021, 3:40am

I agree with this answer for CC based LED drivers. Lowering the chain length is another option. The only thing a FPGA can do is increase parallel chains for larger displays. However this is costly and has low/medium quality. You could try using multiple controllers.

FPGA can avoid BCM, which may have benefits to PSU.

marcmerlin · June 24, 2021, 4:47am

That sounds great, please post your code to show us.

dthacher · September 25, 2021, 7:47am

I should clarify. MCU could in theory use S-PWM (used in many MM drivers) to improve refresh rate. However this only works if you have low latency memory and DMA. The number of interrupts could become a massive problems.

Which could be resolved with the following:

Low HUB75 speed
Fast processor
Long serial chain

In most cases this is too problematic due to stability concerns. This makes the most sense against a certain chain length and quality need. Very niche implementation. RPI likely will never be able to do this without significant effort.

Note BCM is technically possible with this but only helps with certain things. It lowers the amount of work and size of bit planes. However the number of interrupts is very high, if you waste refresh the number of interrupts can be kept low which helps with stability and frees time for other things.

You still have potential issues with BCM at LED driver layer with this approach. There is also an increase in overhead with this approach from dead time required between row switches potentially. (Aka ghosting prevention.)

It would be cool if someone added support for PWM/MM drivers but I have found using a receiver card directly is much more cost effective and offer better performance/quality. There are cases where this is not the case and I stand by my assertion that these are possible on the Pi and MCU. However only make sense in specific cases.

Note refresh is actually higher for S-PWM .

dthacher · October 22, 2021, 6:47pm

I am trying to explain this.

When at 36 bits there is roughly 12 bits per color of gamma however the lower 4 bits are lost. So you only shift out the upper 8 bits of gamma leaving you with 256 linear steps. When you are at 12 bits there is roughly 4 bits per color of gamma (the upper 4 of 12 bits), however the lower 2 bits are lost. So you only shift out the upper 2 bits of gamma leaving you with 4 linear steps.

Therefore it would be: (This would explain why you use PWM bits in multiples of 3)
(bits / 3) - roundup(bits / 9)

36 bits - means 8 real bits
12 bits - means 2 real bits
9 bits - means 1 real bit

Assuming you care about the most significant bits and use BCM this means your gamma error is:
36 bits - 15/4095 (Refresh 280Hz)
12 bits - 1023/4095 (Refresh 900Hz)
9 bits - 2047/4095 (Refresh 1446Hz ???)

Now when I do the math for the serial clock I find that you must be ignoring the lower 4 and 10 bits respectively. I am guessing there is some kind of S-PWM like trick used here.

The max theoretical for 15.73MHz serial would around 3.8kHz using around 11 bits per color for 1 frame per second. (Aka 99.95 gamma accuracy.) Using 30 frames per second would drop this down to 7 bits per color. (Aka 99.2 gamma accuracy.)

Dropping the refresh rate on non-PWM drivers would improve gamma accuracy. For example 1.9kHz using 8 bits per color using 30 frames per second would be possible. (Aka 99.8 gamma accuracy.) For 1 frame per second you could get 12 bits per color. (Aka 99.98 gamma accuracy…worth it? This is why the PWM based LED drivers only really get used on high density panels for bigger displays seen up close.)

Looking at your numbers you support 75 percent gamma accuracy for 12 bits at 1 frame per second. For 30 frames per second you would get less than one bit per color or no gamma accuracy? It does not even work?

However for 36 bits you have 99.6 percent gamma accuracy at 1 frame per second. For 30 frames per second you would get around 3 bits per color or around 87.5 percent gamma accuracy?

Note the accuracy numbers are very high due to rebasing the ranges. The min bits contribute little and since they may not be sent the contribute nothing to range. Therefore they cannot be included. Gamma expresses nonlinear mapping onto linear mapping. BCM is linear mapping trick for expressing PWM steps which is also linear division.

Note this is different from what the Pi library does. It does something different for different reasons. Time works different in that code base. So rebasing the ranges works a little different, and this decreases brightness/gamma accuracy. It does not support S-PWM so refresh rates are tightly coupled to gamma accuracy.

Note color depth and gamma accuracy are not the same. Using less gamma bits lowers color depth. It is not linear and depends on gamma correction mapping. Note there is a limit the LEDs color depth expression, if I am not mistaken.

Am I in the ballpark at least? LOL

sutaburosu · October 22, 2021, 7:58pm

I’ll be honest; none of your calculations make any sense to me.

Let’s start with the default Teensy4 clock rate of 18.462 MHz. The shield has 6 data lines (R1, G1, B1, R2, G2, B2), so it emits 2 RGB pixels per clock cycle.

For my 128x64 matrix that gives: 18,462,000 Hz / 8,192 pixels * 2 pixels-per-clock ~= 4,507 Hz for a 1-bit-per-pixel display. That’s not counting the time to latch and enable the output. The author of the Teensy4 FlexIO output reports 3,200Hz refresh for this 128x64 RGB111 configuration, so my calculation is at least in the same ball park.

Dividing that by 36 bit planes gives 125Hz by my calculation, whereas the author reports 340Hz and I have observed 280Hz working reliably. I’m not sure where/how this discrepancy arises. It may be those benchmarks on PJRC are for pre-release code, and what I’m actually running is significantly different. Or perhaps I’m misreading something due to recently having finished a bottle of wine…

dthacher · October 22, 2021, 9:19pm

I suspect it does one or more of the following:

Changes clock
Drops bits
Debases bits (time)
S-PWM

That is really strange. 240Hz at 48 bit could mean 268.435456MHz which is larger than the max of 240MHz. He said he tried to push it and around RGB888 there is data loss. He also does not appear to need 3 bit multiples.

What is really strange is if I am right, he is using 4.194304MHz. Meaning his clock speed actually goes down with more bits. This because I am assuming he is using 8+2 S-PWM and dropping 6 bits. This is strange because he could have gotten a higher refresh rate. There is no way he is doing 48 bits on 64x128 CC panels even with S-PWM.

Strange though I thought T4 supported up to 96x96 with 36bits up to 240Hz. Lot of random numbers.

He uses a timer, dividing by the data rate of flexIO is likely to lead to problems. Does this play nice with other code?

This is confusing:

github.com

easone/ledMatrixDemo_t4/blob/13afc29c575d9d708343ac5fafd6b1fb0820d390/ledMatrixDemo_t4.ino#L22


      
          #define SMARTMATRIX_OPTIONS_C_SHAPE_STACKING        (1 << 0)
          #define SMARTMATRIX_OPTIONS_BOTTOM_TO_TOP_STACKING  (1 << 1)
          
          
const int matrixWidth = 128; // width of the overall display
          const int matrixHeight = 64; // height of the overall display
          const int matrixPanelHeight = 32; // height of the individual panels making up the display
          const unsigned char optionFlags = SMARTMATRIX_OPTIONS_NONE;
          
          
uint8_t panelBrightness = 64; // range 0-255
          const uint8_t latchesPerRow = 12; // controls the color depth per pixel; value from 1 to 16; 8 is 24bit truecolor
          uint16_t refreshRate = 200; // frames per second. With 12bit color depth, works up to 580 FPS at 600 MHz (720 FPS at 816 MHz)
          const uint8_t dmaBufferNumRows = 4; // number of rows of pixel data in rowDataBuffer; minimum 2; increasing helps with stability
          
          
#define LATCH_TIMER_PULSE_WIDTH_NS  80  // 20 is minimum working value, don't exceed 160 to avoid interference between latch and data transfer
          #define LATCH_TO_CLK_DELAY_NS       400  // max delay from rising edge of latch pulse to first pixel clock
          #define PANEL_PIXELDATA_TRANSFER_MAXIMUM_NS  43  // time to transfer 1 pixel of data at FlexIO clock rate with FLEXIO_CLOCK_DIVIDER=20
          
          
#define LATCH_TIMER_PRESCALE  0
          #define TIMER_FREQUENCY     (F_BUS_ACTUAL>>LATCH_TIMER_PRESCALE)
          #define NS_TO_TICKS(X)      (uint32_t)(TIMER_FREQUENCY * ((X) / 1000000000.0) + 0.5)
          #define LATCH_TIMER_PULSE_WIDTH_TICKS   NS_TO_TICKS(LATCH_TIMER_PULSE_WIDTH_NS)

I suspect this is a timer approach and not using S-PWM. Looks hard coded to 24MHz flexIO. Honestly I am not even sure if this is the actual refresh rate against that quality or not. With a timer it is possible to get a really high refresh because you just shave off bits, but you have no color depth. It could be CPU stress test not IO stress test.

I did not read fully into it. I found the fillRowBuffer, formatRowData and rowAddressBuffer notion interesting. Lot of CPU called from interrupt. Lot of memory movement, which I am guessing is to save memory or to exploit the M7 cache.

Thank you for the links.