Most cost effective/best way to drive a large array of panels (256x256+)?

I am planning to build a largish led panel using the well known HUB75 panels from China.
From what I have read, the limit of even an ESP32 or Teensy 4.0 board seems to be around 128x64 - 128x128 at the moment.

I was wondering what the best way would be to drive a larger panel, my target size would be 512x256 or ideally 512x512 pixels.

I have seen a solution using an raspberry PI to drive a 256x256 at 140hz, so I guess 512x512 should be doable at 30-ish FPS?
For the cost of a single raspberry PI , you could get almost 10 ESP32’s tho, has anyone found a way to use a single ESP32 per smaller panel and chain them together somehow to be able to drive a large amount of pixels that way yet?
I know it has been done with multiple teensy’s to drive WS2811 strips : OctoWS2811 LED Library, Driving Hundreds to Thousands of WS2811 LEDs with Teensy 3.0 , but I have not seen a similar solution for ESP32 and HUB75 panels yet.
Such a solution with separate ESP32’s would seem very scalable, you can even make different aspect ratio’s with multiple smaller panels and easily add/replace 1 of them on the fly.

Would the Raspberry PI way be the best way to go for such a project, or does such a solution exist with ESP32/Teensy’s?

I’m working on a project that allows an ESP32 to receive data from an outside source via SPI (using APA102 LED formatted data) and refresh an up to 128x64 panel. It can receive SPI data at up to 32MHz CLK. Theoretically you could use several of these in parallel to drive several large panels.

Let’s go with your 512x512 pixel example. APA102 formatted data requires 4 bytes per pixel (plus some overhead we can ignore), so 1024kB of data per frame. With 32MHz SPI, you could only update the panels 3.8 frames per second (actually a little bit lower with overhead).

You could separate the data streams, let’s say you divide the SPI data into 8 parallel channels, kinda like OctoWS2811 is doing. You’d be able to get 30FPS updates.

512x512 is a huge size though. Each ESP32 can only refresh a 128x64 panel, so you’d need 32 ESP32s just to refresh panels, plus something else to provide the data. I’m not even sure an ESP32 WROVER module with extra RAM can keep up with generating data that fast. You may need a Raspberry Pi or something similar to generate the data alone.

I’d look at this project which can drive large panel setups, maybe 512x512, I’m not sure: GitHub - hzeller/rpi-rgb-led-matrix: Controlling up to three chains of 64x64, 32x32, 16x32 or similar RGB LED displays using Raspberry Pi GPIO

Funny you’d mention that, I was just making this post:

512x512 could be done on rpi-rgb-led-matrix but the refresh rate would be unacceptable.
Realistically you can do the 384x192 I just did, or you can step up to 384x384 (half refresh rate), and then 512x384 (refresh rate will really suffer), and this is by having 3 busses running in parallel.

I have created a pull request for support of 6 parallel chains using raspberry pi compute module based of Henner’s Library :

whoo, you’re getting me excited now, I was starting to hit the limits of 3 parallel chains (dropping to 100Hz after compromizing color depth).

Do you have any info on how the hardware side works? How are you getting 6 chains’ worth of data lines out of the existing pins? (I’m not familiar with the compute module).
It’s true that the rPi3/rPi4 are fast enough to send data to shift registers to control more than one thing per pin, just like what Yves does for FastLED running up to 5 chains off each ESP32 pin.

Either way, I would love more info, please share.
Also, for people like me who aren’t good at doing their own PCBs, if you contact electrodragon who makes the active-3 board ( info@electrodragon.com ), I’m sure they’d be happy to produce an active-6 board with your specs.

My last array, 384x256, was definitely getting slow, so being to have twice as many channels would definitely improve it


(from Marc's Blog: arduino - RGB Panels, from 192x80, to 384x192, to 384x256 and maybe not much beyond )

It is better if you use 2 raspberry pi’s because the compute module development board is at least $100

It is only feasible if you have to produce a large number of custom PCBs.

As for the performance, The refresh rate slightly decreases when we Use GPIO > 31 because of uint64 operations but the difference is very small. Otherwise for every parallel chain you add, the refresh rate is almost the same. And according to the initial tests it is pushing out data to all GPIO’s without any problem. I have to test it more. Even you can check the performance on your raspberry pi right now. The Compute Module 3+ has the same processor as Raspberry pi 3B+ but it is clocked at max 1.2 GHz. So the refresh rate it a bit less. But when you can get 6 parallel chain it is all worth it.

Ok, so basically you’re saying you still get 3 chains per rPi and that you get 6 chains by using 2 rPis, correct?

Yes,

It’s more. economical to get 6 chains from 2 regular pis than 6 chain from one compute module pi.

I am in the process of developing a custom PCB for the compute module. Maybe I’ll put it out for sale. Let’s see. It will take some time.