Most cost effective/best way to drive a large array of panels (256x256+)?

Funny you’d mention that, I was just making this post:

512x512 could be done on rpi-rgb-led-matrix but the refresh rate would be unacceptable.
Realistically you can do the 384x192 I just did, or you can step up to 384x384 (half refresh rate), and then 512x384 (refresh rate will really suffer), and this is by having 3 busses running in parallel.