Can't get enough DMA memory on ESP32: assertion "matrixUpdateFrames[1] != NULL

On ESP32, I can run FastLED with WS2812 (48 pixels) and an RGBPanel in direct driving mode with SmartMatrix (96x64).
This works on a very small example.

Then, when I added neopixel support in my much bigger demo code, it crashes with an assert.

Starting SmartMatrix Mallocs
Heap Memory Available: 285616 bytes total, 113792 bytes largest free block: 
8-bit Accessible Memory Available: 199468 bytes total, 113792 bytes largest free block: 
32-bit Memory Available: 285616 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 199468 bytes total, 113792 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 247236 bytes total, 95344 bytes largest free block: 
Starting SmartMatrix DMA Mallocs
assertion "matrixUpdateFrames[1] != NULL" failed: file "/home/merlin/Arduino/libraries/SmartMatrix_me/src/SmartMatrixMultiplexedRefreshEsp32_Impl.h", line 194, function: static void SmartMatrix3RefreshMultiplexed<refreshDepth, matrixWidth, matrixHeight, panelType, optionFlags>::begin(uint32_t) [with int refreshDepth = 24; int matrixWidth = 64; int matrixHeight = 96; unsigned char panelType = 0u; unsigned char optionFlags = 0u; uint32_t = unsigned int]

It means that

    matrixUpdateFrames[1] = (frameStruct *)heap_caps_malloc(sizeof(frameStruct), MALLOC_CAP_DMA);

failed to get DMA memory clearly.
Iā€™m a bit confused as to why I can get enough DMA memory to run the same panel size and the same 48 neopixels in simpler code, but it fails in my larger code which Iā€™m not too sure uses DMA.

The simpler example starts with:

Starting SmartMatrix DMA Mallocs
DMA Memory Available: 246828 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 197660 bytes total, 113792 bytes largest free block: 
sizeof framestruct: 0000C000
matrixUpdateFrames[0] pointer: 3FFCAF14
matrixUpdateFrames[1] pointer: 3FFE4374

The failing one starts with:

Starting SmartMatrix Mallocs
Heap Memory Available: 285616 bytes total, 113792 bytes largest free block: 
8-bit Accessible Memory Available: 199468 bytes total, 113792 bytes largest free block: 
32-bit Memory Available: 285616 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 199468 bytes total, 113792 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 247236 bytes total, 95344 bytes largest free block: 
Starting SmartMatrix DMA Mallocs
DMA Memory Available: 159904 bytes total, 95344 bytes largest free block: 
DMA Memory Available: 110736 bytes total, 46176 bytes largest free block: 
assertion "matrixUpdateFrames[1] != NULL" failed: 

Well, to answer myself it was a simple out of memory condition. Simply adding neopixels into the mix somehow took enough extra memory that it threw me over the edge.
I removed a malloc somewhere else and that took care of the DMA allocation failure. I was confused about DMA memory being a special pool, but in fact I simply ran out of normal memory.

Interestingly, my build says
Global variables use 97692 bytes (29%) of dynamic memory, leaving 229988 bytes for local variables. Maximum is 327680 bytes.

and runtime is below. Do I read correctly that SmartMatrix has 291KB before it starts and leaves me with a mere 4KB?
("17172 available, leaving 4884 free: ")
or maybe a bit more
DMA Memory Available: 61548 bytes total, 15456 bytes largest free block:

Starting SmartMatrix Mallocs
Heap Memory Available: 291772 bytes total, 113792 bytes largest free block: 
8-bit Accessible Memory Available: 205624 bytes total, 113792 bytes largest free block: 
32-bit Memory Available: 291772 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 205624 bytes total, 113792 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 253392 bytes total, 113792 bytes largest free block: 
Starting SmartMatrix DMA Mallocs
DMA Memory Available: 166060 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 116892 bytes total, 64624 bytes largest free block: 
sizeof framestruct: 0000C000
matrixUpdateFrames[0] pointer: 3FFE4374
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 153872 bytes total, 86148 bytes largest free block: 
8-bit Accessible Memory Available: 67724 bytes total, 17172 bytes largest free block: 
32-bit Memory Available: 153872 bytes total, 86148 bytes largest free block: 
DMA Memory Available: 67724 bytes total, 17172 bytes largest free block: 
Allocating refresh buffer:
DMA Memory Available: 67724 bytes total, 17172 bytes largest free block: 
lsbMsbTransitionBit of 0 requires 49152 RAM, 17172 available, leaving -31980 free: 
lsbMsbTransitionBit of 1 requires 24576 RAM, 17172 available, leaving -7404 free: 
lsbMsbTransitionBit of 2 requires 12288 RAM, 17172 available, leaving 4884 free: 
Raised lsbMsbTransitionBit to 2/7 to fit in RAM
lsbMsbTransitionBit of 2 gives 100 Hz refresh, 120 requested: 
lsbMsbTransitionBit of 3 gives 191 Hz refresh, 120 requested: 
Raised lsbMsbTransitionBit to 3/7 to meet minimum refresh rate
Descriptors for lsbMsbTransitionBit 3/7 with 16 rows require 6144 bytes of DMA RAM
SmartMatrix Mallocs Complete
Heap Memory Available: 147696 bytes total, 86148 bytes largest free block: 
8-bit Accessible Memory Available: 61548 bytes total, 15456 bytes largest free block: 
32-bit Memory Available: 147696 bytes total, 86148 bytes largest free block: 
DMA Memory Available: 61548 bytes total, 15456 bytes largest free block: 

@Louis, Iā€™m still trying to make sense out of this.
It seems unexpected that SmartMatrix is using almost all of 291772 bytes
I have 96643 = 18432 bytes needed for that matrix in 24bpp

I have 3 copies I think

  1. matrixLayer
  2. backgroundLayer
  3. my SmartMatrix::GFX framebuffer that gets memcopied into backgroundlayer:
void show_callback() {
    memcpy(backgroundLayer.backBuffer(), matrixleds, kMatrixHeight*kMatrixWidth*3);
    backgroundLayer.swapBuffers(false);}

So, thatā€™s 55296 bytes used just for 3 24bpp framebuffers.
This leaves me with 291772 - 55296 - 4884 = 231,592 bytes that seem to be used by the library in addition to the framebuffers.

  1. this is huge
  2. does it seem correct?
  3. can I tweak lib/driver settings to use less RAM than this?

Thanks.

What the ESP32 API reports is misleading. It says you have 291772 bytes total in the heap, but some of that is 32-bit memory that canā€™t be used by SmartMatrix Library or most other libraries. Pay more attention to the 8-bit accessible memory, which you can consider ā€œusable memoryā€.

SmartMatrix Library tries to allocate as much RAM as possible to refresh the panels. Iā€™d call matrix.begin() as late as possible if you need to malloc other memory first.

If you switch to the circuit using the latch, youā€™ll double the efficiency of the RAM used for refreshing the panels. That may leave some more free RAM, or improve the refresh quality (brightness or color depth, canā€™t recall). I canā€™t recall if I added something to limit the amount of RAM thatā€™s allocated for refreshing, seems like a good thing to add. e.g. matrix.begin(MAX_RAM)

Thereā€™s a THT version of the ESP32 SmartLED Shield that has the magic RAM-reducing Latch: SmartMatrix/extras/hardware/SmartLEDShield_ESP32_THT_V0_brd.pdf at teensylc Ā· pixelmatix/SmartMatrix Ā· GitHub

Long term, my fix for all these malloc issues isnā€™t in software, itā€™s to move refresh to a separate CPU. Iā€™ve actually started this project. Trying to figure how to make the ESP32 receive SPI slave data (APA102 format) continuously is my current challenge.

Thanks @Louis . Iā€™m actually already calling matrix.begin somewhat late. Note from my first report how itā€™s actually

     matrixUpdateFrames[1] = (frameStruct *)heap_caps_malloc(sizeof(frameStruct), MALLOC_CAP_DMA);

that failed, way before the later lsbMsbTransitionBit mallocs.
It does work if I reduce my matrix to 90x64 instead of 96x64, so it doesnā€™t fail by much.
Contrary to your suggestion, I actually fixed my problem by removing a malloc that was happening before the lib ran init, and caused it to fail on that assert, and that malloc can fit later.

I get your point about needing to look at 8 bit memory instead.
So, that gives me 205624 bytes, or 205624 - 55296 - 61548 = 88780 bytes used by the extra buffers, which is 4.8 times whatā€™s needed for 96643 = 18432 bytes.
Does that sound plausible?
Can it be reduced further in software?

Iā€™m happy to switch to the latch if I can buy the shield from someone (Iā€™m not equipped to get them made). Thanks for the link to it.

Moving to a 2nd CPU sounds interesting. I would be able to have 2 CPUs if that helps, and especially if that allows going all the way to 128x128.

Whereā€™s the gap? Ordering the board? Ordering the parts? Soldering?

Getting PCBs made is real easy now, drag the .brd file into Oshpark.com, glance at the test plots to make sure they look reasonable, press ā€œorderā€

Ok, maybe it is that Iā€™ve never done this :slight_smile:
Yes, I can solder without issues and Iā€™m guessing ordering the parts shouldnā€™t be too hard.
I was happy to give money to @Jason or you so far :slight_smile: (I guess Jason doesnā€™t actually log into this board anymore, not sure if he gets notifications).

Back to the memory issues, Iā€™m thinking about factoring out your memory status print into a function that can be called multiple times instead of cutting and pasting those bits over and over again.
Iā€™ll send you a PR if youā€™re ok with that.
I can also send you a separate PR for my wiring compatible with Jasonā€™s shield.

PRs sound good. The parts should be in the BOM excel sheet, I think I made a column for the THT parts, with Digikey order numbers. LMK if you have questions

Also, I did add a way to save memory, call matrix.begin() with a number, I think youā€™re already doing this if you followed the AnimatedGIFs sketch, you can increase that number.

done

1 Like

Back to that bit, I actually wasnā€™t doing this in my own code, so I am now, thanks for reminding me to look at it (I had seen it but forgot).
In my existing code, matrixLayer.begin(40000) returns this:

Starting SmartMatrix Mallocs
Heap Memory Available: 291308 bytes total, 113792 bytes largest free block: 
8-bit Accessible Memory Available: 205560 bytes total, 113792 bytes largest free block: 
32-bit Memory Available: 291308 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 205560 bytes total, 113792 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 252928 bytes total, 113792 bytes largest free block: 
Starting SmartMatrix DMA Mallocs
sizeof framestruct: 0000C000
DMA Memory Available before ptr1 alloc: 67660 bytes total, 17172 bytes largest free block
matrixUpdateFrames[0] pointer: 3FFE4374
DMA Memory Available before ptr2 alloc: 67660 bytes total, 17172 bytes largest free block
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 153408 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 67660 bytes total, 17172 bytes largest free block
32-bit Memory Available: 153408 bytes total, 85748 bytes largest free block
DMA Memory Available: 67660 bytes total, 17172 bytes largest free block
Allocating refresh buffer:
lsbMsbTransitionBit of 0 requires 49152 RAM, 17172 available, leaving -31980 free: 
lsbMsbTransitionBit of 1 requires 24576 RAM, 17172 available, leaving -7404 free: 
lsbMsbTransitionBit of 2 requires 12288 RAM, 17172 available, leaving 4884 free: 
lsbMsbTransitionBit of 3 requires 6144 RAM, 17172 available, leaving 11028 free: 
lsbMsbTransitionBit of 4 requires 3072 RAM, 17172 available, leaving 14100 free: 
lsbMsbTransitionBit of 5 requires 1536 RAM, 17172 available, leaving 15636 free: 
lsbMsbTransitionBit of 6 requires 768 RAM, 17172 available, leaving 16404 free: 
lsbMsbTransitionBit of 7 requires 384 RAM, 17172 available, leaving 16788 free: 
Raised lsbMsbTransitionBit to 7/7 to fit in RAM
lsbMsbTransitionBit of 7 gives 813 Hz refresh, 120 requested: 
Raised lsbMsbTransitionBit to 7/7 to meet minimum refresh rate

So, from what I can tell

  1. 813Khz refresh rate is pretty cool. I didnā€™t realize that lowering the RAM use would increase the refresh rate.
    I donā€™t fully understand the tradeoff here, why would I not want to have ā€˜lsbMsbTransitionBit of 7ā€™?
    Looks like a higher number gives me more Hz and less RAM used. What do I lose in return?

  2. Switching to FatFS is causing me a problem because I end up with not enough RAM free for FatFS to work with my code + SmartMatrix.
    I start with
    8-bit Accessible Memory Available: 205560 bytes total, 113792 bytes largest free block:
    then after the 2 matrixupdateframes are allocated, I have:
    8-bit Accessible Memory Available: 67660 bytes total, 17172 bytes largest free block

Why is it using 134KB (205560-67660) when my frame is 96 x 64 x 3bpp = 18KB ?
I see how the lsbMsbTransitionBit code is trying to then be frugal, but by the time it does this, Iā€™ve already lost a lot of RAM.

Is there nothing I can do to avoid having SmartMatrix using 7 times the amount of RAM necessary for my FB?

Of course, if I init FFat before SmartMatrix, SmartMatrix fails:

Starting SmartMatrix Mallocs
Heap Memory Available: 242436 bytes total, 85748 bytes largest free block: 
8-bit Accessible Memory Available: 156688 bytes total, 66168 bytes largest free block: 
32-bit Memory Available: 242436 bytes total, 85748 bytes largest free block: 
DMA Memory Available: 156688 bytes total, 66168 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 204056 bytes total, 85748 bytes largest free block:
Starting SmartMatrix DMA Mallocs
assertion "matrixUpdateFrames[1] != NULL" failed: file 

This shows that running FFat first takes 291308-242436 = 48KB
but by then the RAM left is just not enough for SmartMatrix.

Now, I see that if I run AnimatedGifs only, there is a lot more RAM free after FFat and SmartMatrix have both initialized, so Iā€™ll see if I can trim the code on my side too, but SmartMatrix using 134KB for its framebuffers when it should only need to store 2 18KB frames, seems like a problem too.

AnimatedGifs with SmartMatrix Native API after init:

Heap Memory Available: 231388 bytes total, 86652 bytes largest free block
8-bit Accessible Memory Available: 144736 bytes total, 64624 bytes largest free block

AnimatedGifs with SmartMatrix driver in NeoMatrix Layer after init:

Heap Memory Available: 213412 bytes total, 86652 bytes largest free block
8-bit Accessible Memory Available: 126760 bytes total, 64624 bytes largest free block

This shows 17976 bytes missing by adding my NeoMatrix layer which adds up to 18KB used by my extra framebuffer, thatā€™s correct.

After the gif decoding layer has taken its extra memory (using malloc), this is whatā€™s left:

Heap Memory Available: 152164 bytes total, 86652 bytes largest free block
8-bit Accessible Memory Available: 65512 bytes total, 18848 bytes largest free block
32-bit Memory Available: 152164 bytes total, 86652 bytes largest free block
DMA Memory Available: 65512 bytes total, 18848 bytes largest free block

So about 64KB free for other code and 18KB contiguous when Iā€™m using 96x64 with SmartMatrix + NeoMatrix (extra 18KB FB) + AnimatedGifs + FFat (apparently 48KB), and no other code.

I can probably shrink my other code to fit, but itā€™s not a lot left. If SmartMatrix could be made not to use 7-8X my FB size, that would sure help :slight_smile:

I explained some lsbMsbTransitionBit and the pros and cons of it here: https://www.esp32.com/viewtopic.php?f=17&t=3188&start=30#p22401

Also thereā€™s a bit talking about the RAM (sometimes a lot of RAM) required for storing linked lists to tell the DMA what data to move in what order.

SmartMatrix Library has to allocate memory for the actual RGB frames, the data to shift out to the GPIO lines (which isnā€™t very efficient, as itā€™s including LAT and OE which donā€™t change much, and itā€™s even more inefficient when you are driving ADDX lines from GPIO), and also the linked lists for DMA, and probably more.

If youā€™re not already using 24-bit refresh, try that. If you can live with the artifacts from Steveā€™s frame-based refresh (or there turns out to be a fix for the artifacts), that should be more RAM efficient because there should be a lot fewer linked lists. I havenā€™t looked at his code to verify this, but I assume it essentially reverses the changes I described in the esp32 forum post linked above, so uses less RAM.

Thanks, so basically

  1. look at @seratosteve 's code to see if it saves me RAM
  2. look at transitioning to a shield with latch which I have to get printed, and get components for. It will take a little while
  3. I just learned FFat.begin can take an argument of how many files can be open, and that will reduce its memory use
  4. see if I can get some mallocs/arrays to use 32bit memory that I still have enough of, instead of using 8bit memory which is running out

Iā€™d add in ā€œ1a. try Steveā€™s code to see if the artifacts are acceptableā€. Iā€™m not sure how they look with GIF content. With the scrolling text on a black background the green trails are pretty bad in my opinion. Maybe it depends on the panels. Iā€™ve only tried with a couple panels here.

1 Like

Ok, first the good news, I got this working with this simple patch:

This saves 36KB of RAM and puts me back under the limit.

Now, I also just took the time to get @seratosteve 's branch working for me, and to be honest, it looks fine display-wise to me. Thatā€™s good news.
However, sadly, I see 0 difference between the 2 RAM-wise.

Steveā€™s code,

Starting SmartMatrix Mallocs
Heap Memory Available: 325548 bytes total, 113792 bytes largest free block: 
8-bit Accessible Memory Available: 239800 bytes total, 113792 bytes largest free block: 
32-bit Memory Available: 325548 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 239800 bytes total, 113792 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 287168 bytes total, 113792 bytes largest free block: 
Starting SmartMatrix DMA Mallocs
sizeof framestruct: 0000C000
DMA Memory Available before ptr1 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[0] pointer: 3FFE4374
DMA Memory Available before ptr2 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 187648 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101900 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187648 bytes total, 85748 bytes largest free block
DMA Memory Available: 101900 bytes total, 39960 bytes largest free block
Allocating refresh buffer:
Bitplanes take 6144 bytes, requiring 2 DMA descriptors each.
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 0 requires 12240 RAM, 39960 available, leaving 27720 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 1 requires 6144 RAM, 39960 available, leaving 33816 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 2 requires 3120 RAM, 39960 available, leaving 36840 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 3 requires 1632 RAM, 39960 available, leaving 38328 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 4 requires 912 RAM, 39960 available, leaving 39048 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 5 requires 576 RAM, 39960 available, leaving 39384 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 6 requires 432 RAM, 39960 available, leaving 39528 free: 
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 7 requires 384 RAM, 39960 available, leaving 39576 free: 
Raised lsbMsbTransitionBit to 7/7 to fit in RAM
lsbMsbTransitionBit of 7 gives 813 Hz refresh, 60 requested: 
Raised lsbMsbTransitionBit to 7/7 to meet minimum refresh rate
Descriptors for lsbMsbTransitionBit 7/7 with 16 rows require 384 bytes of DMA RAM
SmartMatrix Mallocs Complete
Heap Memory Available: 187232 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101484 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187232 bytes total, 85748 bytes largest free block
DMA Memory Available: 101484 bytes total, 39960 bytes largest free block
Setting up parallel I2S bus at I2S1

Scan by Row post init

Starting SmartMatrix Mallocs
Heap Memory Available: 325548 bytes total, 113792 bytes largest free block: 
8-bit Accessible Memory Available: 239800 bytes total, 113792 bytes largest free block: 
32-bit Memory Available: 325548 bytes total, 113792 bytes largest free block: 
DMA Memory Available: 239800 bytes total, 113792 bytes largest free block: 
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 287168 bytes total, 113792 bytes largest free block: 
Starting SmartMatrix DMA Mallocs
sizeof framestruct: 0000C000
DMA Memory Available before ptr1 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[0] pointer: 3FFE4374
DMA Memory Available before ptr2 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 187648 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101900 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187648 bytes total, 85748 bytes largest free block
DMA Memory Available: 101900 bytes total, 39960 bytes largest free block
Allocating refresh buffer:
lsbMsbTransitionBit of 0 requires 49152 RAM, 39960 available, leaving -9192 free: 
lsbMsbTransitionBit of 1 requires 24576 RAM, 39960 available, leaving 15384 free: 
lsbMsbTransitionBit of 2 requires 12288 RAM, 39960 available, leaving 27672 free: 
lsbMsbTransitionBit of 3 requires 6144 RAM, 39960 available, leaving 33816 free: 
lsbMsbTransitionBit of 4 requires 3072 RAM, 39960 available, leaving 36888 free: 
lsbMsbTransitionBit of 5 requires 1536 RAM, 39960 available, leaving 38424 free: 
lsbMsbTransitionBit of 6 requires 768 RAM, 39960 available, leaving 39192 free: 
lsbMsbTransitionBit of 7 requires 384 RAM, 39960 available, leaving 39576 free: 
Raised lsbMsbTransitionBit to 7/7 to fit in RAM
lsbMsbTransitionBit of 7 gives 813 Hz refresh, 60 requested: 
Raised lsbMsbTransitionBit to 7/7 to meet minimum refresh rate
Descriptors for lsbMsbTransitionBit 7/7 with 16 rows require 384 bytes of DMA RAM
SmartMatrix Mallocs Complete
Heap Memory Available: 187232 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101484 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187232 bytes total, 85748 bytes largest free block
DMA Memory Available: 101484 bytes total, 39960 bytes largest free block

Are you seeing the green trails and theyā€™re acceptable, or do they not show up? Can you try the MultipleTextLayers sketch and tell me if you see the same as the video I posted, or maybe share a recording of what you see?

I havenā€™t looked at Steveā€™s code to see how itā€™s using DMA linked lists. I imagine it could be made more efficient if youā€™re not seeing any decrease in RAM usage. Iā€™m reluctant to spend any time on it until I see some workaround for the green trails issue.

@Louis there is a very small hint of green on panel #2, one pixel high.
See
https://photos.app.goo.gl/v6giTzohkBGCA6JD6
Itā€™s barely visible for me and only if I donā€™t fill the panel, which my patterns usually do.

Text is scrolling in the wrong direction too, but thatā€™s always been true with your teensylc fork.

I used this

#define COLOR_DEPTH 24                  // known working: 24, 48 - If the sketch uses type `rgb24` directly, COLOR_DEPTH must be 24
const uint8_t kMatrixWidth = 64;        // known working: 32, 64, 96, 128
const uint8_t kMatrixHeight = 96;       // known working: 16, 32, 48, 64
const uint8_t kRefreshDepth = 24;       // known working: 24, 36, 48
const uint8_t kDmaBufferRows = 4;       // known working: 2-4, 

OK, thanks for the video. This is panel dependent, please see other thread.

Thanks for that explanation, it does help.
As you said, what I lose is color depth (not actually so noticable) and brightness (much more noticeable).
ā€œlsbMsbTransitionBit of 7ā€ is not a happy place to be at indeed, now Iā€™m back to 3 due to my panel size (96x64) and refresh rate (>120Hz)

lsbMsbTransitionBit of 0 requires 49152 RAM, 39960 available, leaving -9192 free: 
lsbMsbTransitionBit of 1 requires 24576 RAM, 39960 available, leaving 15384 free: 
lsbMsbTransitionBit of 2 requires 12288 RAM, 39960 available, leaving 27672 free: 
Raised lsbMsbTransitionBit to 2/7 to fit in RAM
lsbMsbTransitionBit of 2 gives 100 Hz refresh, 120 requested: 
lsbMsbTransitionBit of 3 gives 191 Hz refresh, 120 requested: 

Looks much better.