On ESP32, I can run FastLED with WS2812 (48 pixels) and an RGBPanel in direct driving mode with SmartMatrix (96x64).
This works on a very small example.
Then, when I added neopixel support in my much bigger demo code, it crashes with an assert.
failed to get DMA memory clearly.
Iām a bit confused as to why I can get enough DMA memory to run the same panel size and the same 48 neopixels in simpler code, but it fails in my larger code which Iām not too sure uses DMA.
Well, to answer myself it was a simple out of memory condition. Simply adding neopixels into the mix somehow took enough extra memory that it threw me over the edge.
I removed a malloc somewhere else and that took care of the DMA allocation failure. I was confused about DMA memory being a special pool, but in fact I simply ran out of normal memory.
Interestingly, my build says
Global variables use 97692 bytes (29%) of dynamic memory, leaving 229988 bytes for local variables. Maximum is 327680 bytes.
and runtime is below. Do I read correctly that SmartMatrix has 291KB before it starts and leaves me with a mere 4KB?
("17172 available, leaving 4884 free: ")
or maybe a bit more
DMA Memory Available: 61548 bytes total, 15456 bytes largest free block:
Starting SmartMatrix Mallocs
Heap Memory Available: 291772 bytes total, 113792 bytes largest free block:
8-bit Accessible Memory Available: 205624 bytes total, 113792 bytes largest free block:
32-bit Memory Available: 291772 bytes total, 113792 bytes largest free block:
DMA Memory Available: 205624 bytes total, 113792 bytes largest free block:
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 253392 bytes total, 113792 bytes largest free block:
Starting SmartMatrix DMA Mallocs
DMA Memory Available: 166060 bytes total, 113792 bytes largest free block:
DMA Memory Available: 116892 bytes total, 64624 bytes largest free block:
sizeof framestruct: 0000C000
matrixUpdateFrames[0] pointer: 3FFE4374
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 153872 bytes total, 86148 bytes largest free block:
8-bit Accessible Memory Available: 67724 bytes total, 17172 bytes largest free block:
32-bit Memory Available: 153872 bytes total, 86148 bytes largest free block:
DMA Memory Available: 67724 bytes total, 17172 bytes largest free block:
Allocating refresh buffer:
DMA Memory Available: 67724 bytes total, 17172 bytes largest free block:
lsbMsbTransitionBit of 0 requires 49152 RAM, 17172 available, leaving -31980 free:
lsbMsbTransitionBit of 1 requires 24576 RAM, 17172 available, leaving -7404 free:
lsbMsbTransitionBit of 2 requires 12288 RAM, 17172 available, leaving 4884 free:
Raised lsbMsbTransitionBit to 2/7 to fit in RAM
lsbMsbTransitionBit of 2 gives 100 Hz refresh, 120 requested:
lsbMsbTransitionBit of 3 gives 191 Hz refresh, 120 requested:
Raised lsbMsbTransitionBit to 3/7 to meet minimum refresh rate
Descriptors for lsbMsbTransitionBit 3/7 with 16 rows require 6144 bytes of DMA RAM
SmartMatrix Mallocs Complete
Heap Memory Available: 147696 bytes total, 86148 bytes largest free block:
8-bit Accessible Memory Available: 61548 bytes total, 15456 bytes largest free block:
32-bit Memory Available: 147696 bytes total, 86148 bytes largest free block:
DMA Memory Available: 61548 bytes total, 15456 bytes largest free block:
@Louis, Iām still trying to make sense out of this.
It seems unexpected that SmartMatrix is using almost all of 291772 bytes
I have 96643 = 18432 bytes needed for that matrix in 24bpp
I have 3 copies I think
matrixLayer
backgroundLayer
my SmartMatrix::GFX framebuffer that gets memcopied into backgroundlayer:
So, thatās 55296 bytes used just for 3 24bpp framebuffers.
This leaves me with 291772 - 55296 - 4884 = 231,592 bytes that seem to be used by the library in addition to the framebuffers.
this is huge
does it seem correct?
can I tweak lib/driver settings to use less RAM than this?
What the ESP32 API reports is misleading. It says you have 291772 bytes total in the heap, but some of that is 32-bit memory that canāt be used by SmartMatrix Library or most other libraries. Pay more attention to the 8-bit accessible memory, which you can consider āusable memoryā.
SmartMatrix Library tries to allocate as much RAM as possible to refresh the panels. Iād call matrix.begin() as late as possible if you need to malloc other memory first.
If you switch to the circuit using the latch, youāll double the efficiency of the RAM used for refreshing the panels. That may leave some more free RAM, or improve the refresh quality (brightness or color depth, canāt recall). I canāt recall if I added something to limit the amount of RAM thatās allocated for refreshing, seems like a good thing to add. e.g. matrix.begin(MAX_RAM)
Long term, my fix for all these malloc issues isnāt in software, itās to move refresh to a separate CPU. Iāve actually started this project. Trying to figure how to make the ESP32 receive SPI slave data (APA102 format) continuously is my current challenge.
that failed, way before the later lsbMsbTransitionBit mallocs.
It does work if I reduce my matrix to 90x64 instead of 96x64, so it doesnāt fail by much.
Contrary to your suggestion, I actually fixed my problem by removing a malloc that was happening before the lib ran init, and caused it to fail on that assert, and that malloc can fit later.
I get your point about needing to look at 8 bit memory instead.
So, that gives me 205624 bytes, or 205624 - 55296 - 61548 = 88780 bytes used by the extra buffers, which is 4.8 times whatās needed for 96643 = 18432 bytes.
Does that sound plausible?
Can it be reduced further in software?
Iām happy to switch to the latch if I can buy the shield from someone (Iām not equipped to get them made). Thanks for the link to it.
Moving to a 2nd CPU sounds interesting. I would be able to have 2 CPUs if that helps, and especially if that allows going all the way to 128x128.
Ok, maybe it is that Iāve never done this
Yes, I can solder without issues and Iām guessing ordering the parts shouldnāt be too hard.
I was happy to give money to @Jason or you so far (I guess Jason doesnāt actually log into this board anymore, not sure if he gets notifications).
Back to the memory issues, Iām thinking about factoring out your memory status print into a function that can be called multiple times instead of cutting and pasting those bits over and over again.
Iāll send you a PR if youāre ok with that.
I can also send you a separate PR for my wiring compatible with Jasonās shield.
PRs sound good. The parts should be in the BOM excel sheet, I think I made a column for the THT parts, with Digikey order numbers. LMK if you have questions
Also, I did add a way to save memory, call matrix.begin() with a number, I think youāre already doing this if you followed the AnimatedGIFs sketch, you can increase that number.
Back to that bit, I actually wasnāt doing this in my own code, so I am now, thanks for reminding me to look at it (I had seen it but forgot).
In my existing code, matrixLayer.begin(40000) returns this:
Starting SmartMatrix Mallocs
Heap Memory Available: 291308 bytes total, 113792 bytes largest free block:
8-bit Accessible Memory Available: 205560 bytes total, 113792 bytes largest free block:
32-bit Memory Available: 291308 bytes total, 113792 bytes largest free block:
DMA Memory Available: 205560 bytes total, 113792 bytes largest free block:
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 252928 bytes total, 113792 bytes largest free block:
Starting SmartMatrix DMA Mallocs
sizeof framestruct: 0000C000
DMA Memory Available before ptr1 alloc: 67660 bytes total, 17172 bytes largest free block
matrixUpdateFrames[0] pointer: 3FFE4374
DMA Memory Available before ptr2 alloc: 67660 bytes total, 17172 bytes largest free block
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 153408 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 67660 bytes total, 17172 bytes largest free block
32-bit Memory Available: 153408 bytes total, 85748 bytes largest free block
DMA Memory Available: 67660 bytes total, 17172 bytes largest free block
Allocating refresh buffer:
lsbMsbTransitionBit of 0 requires 49152 RAM, 17172 available, leaving -31980 free:
lsbMsbTransitionBit of 1 requires 24576 RAM, 17172 available, leaving -7404 free:
lsbMsbTransitionBit of 2 requires 12288 RAM, 17172 available, leaving 4884 free:
lsbMsbTransitionBit of 3 requires 6144 RAM, 17172 available, leaving 11028 free:
lsbMsbTransitionBit of 4 requires 3072 RAM, 17172 available, leaving 14100 free:
lsbMsbTransitionBit of 5 requires 1536 RAM, 17172 available, leaving 15636 free:
lsbMsbTransitionBit of 6 requires 768 RAM, 17172 available, leaving 16404 free:
lsbMsbTransitionBit of 7 requires 384 RAM, 17172 available, leaving 16788 free:
Raised lsbMsbTransitionBit to 7/7 to fit in RAM
lsbMsbTransitionBit of 7 gives 813 Hz refresh, 120 requested:
Raised lsbMsbTransitionBit to 7/7 to meet minimum refresh rate
So, from what I can tell
813Khz refresh rate is pretty cool. I didnāt realize that lowering the RAM use would increase the refresh rate.
I donāt fully understand the tradeoff here, why would I not want to have ālsbMsbTransitionBit of 7ā?
Looks like a higher number gives me more Hz and less RAM used. What do I lose in return?
Switching to FatFS is causing me a problem because I end up with not enough RAM free for FatFS to work with my code + SmartMatrix.
I start with
8-bit Accessible Memory Available: 205560 bytes total, 113792 bytes largest free block:
then after the 2 matrixupdateframes are allocated, I have:
8-bit Accessible Memory Available: 67660 bytes total, 17172 bytes largest free block
Why is it using 134KB (205560-67660) when my frame is 96 x 64 x 3bpp = 18KB ?
I see how the lsbMsbTransitionBit code is trying to then be frugal, but by the time it does this, Iāve already lost a lot of RAM.
Is there nothing I can do to avoid having SmartMatrix using 7 times the amount of RAM necessary for my FB?
This shows that running FFat first takes 291308-242436 = 48KB
but by then the RAM left is just not enough for SmartMatrix.
Now, I see that if I run AnimatedGifs only, there is a lot more RAM free after FFat and SmartMatrix have both initialized, so Iāll see if I can trim the code on my side too, but SmartMatrix using 134KB for its framebuffers when it should only need to store 2 18KB frames, seems like a problem too.
AnimatedGifs with SmartMatrix Native API after init:
So about 64KB free for other code and 18KB contiguous when Iām using 96x64 with SmartMatrix + NeoMatrix (extra 18KB FB) + AnimatedGifs + FFat (apparently 48KB), and no other code.
I can probably shrink my other code to fit, but itās not a lot left. If SmartMatrix could be made not to use 7-8X my FB size, that would sure help
Also thereās a bit talking about the RAM (sometimes a lot of RAM) required for storing linked lists to tell the DMA what data to move in what order.
SmartMatrix Library has to allocate memory for the actual RGB frames, the data to shift out to the GPIO lines (which isnāt very efficient, as itās including LAT and OE which donāt change much, and itās even more inefficient when you are driving ADDX lines from GPIO), and also the linked lists for DMA, and probably more.
If youāre not already using 24-bit refresh, try that. If you can live with the artifacts from Steveās frame-based refresh (or there turns out to be a fix for the artifacts), that should be more RAM efficient because there should be a lot fewer linked lists. I havenāt looked at his code to verify this, but I assume it essentially reverses the changes I described in the esp32 forum post linked above, so uses less RAM.
Iād add in ā1a. try Steveās code to see if the artifacts are acceptableā. Iām not sure how they look with GIF content. With the scrolling text on a black background the green trails are pretty bad in my opinion. Maybe it depends on the panels. Iāve only tried with a couple panels here.
Ok, first the good news, I got this working with this simple patch:
This saves 36KB of RAM and puts me back under the limit.
Now, I also just took the time to get @seratosteve 's branch working for me, and to be honest, it looks fine display-wise to me. Thatās good news.
However, sadly, I see 0 difference between the 2 RAM-wise.
Steveās code,
Starting SmartMatrix Mallocs
Heap Memory Available: 325548 bytes total, 113792 bytes largest free block:
8-bit Accessible Memory Available: 239800 bytes total, 113792 bytes largest free block:
32-bit Memory Available: 325548 bytes total, 113792 bytes largest free block:
DMA Memory Available: 239800 bytes total, 113792 bytes largest free block:
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 287168 bytes total, 113792 bytes largest free block:
Starting SmartMatrix DMA Mallocs
sizeof framestruct: 0000C000
DMA Memory Available before ptr1 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[0] pointer: 3FFE4374
DMA Memory Available before ptr2 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 187648 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101900 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187648 bytes total, 85748 bytes largest free block
DMA Memory Available: 101900 bytes total, 39960 bytes largest free block
Allocating refresh buffer:
Bitplanes take 6144 bytes, requiring 2 DMA descriptors each.
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 0 requires 12240 RAM, 39960 available, leaving 27720 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 1 requires 6144 RAM, 39960 available, leaving 33816 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 2 requires 3120 RAM, 39960 available, leaving 36840 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 3 requires 1632 RAM, 39960 available, leaving 38328 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 4 requires 912 RAM, 39960 available, leaving 39048 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 5 requires 576 RAM, 39960 available, leaving 39384 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 6 requires 432 RAM, 39960 available, leaving 39528 free:
>>>> Scan by bitplane init <<<<
lsbMsbTransitionBit of 7 requires 384 RAM, 39960 available, leaving 39576 free:
Raised lsbMsbTransitionBit to 7/7 to fit in RAM
lsbMsbTransitionBit of 7 gives 813 Hz refresh, 60 requested:
Raised lsbMsbTransitionBit to 7/7 to meet minimum refresh rate
Descriptors for lsbMsbTransitionBit 7/7 with 16 rows require 384 bytes of DMA RAM
SmartMatrix Mallocs Complete
Heap Memory Available: 187232 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101484 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187232 bytes total, 85748 bytes largest free block
DMA Memory Available: 101484 bytes total, 39960 bytes largest free block
Setting up parallel I2S bus at I2S1
Scan by Row post init
Starting SmartMatrix Mallocs
Heap Memory Available: 325548 bytes total, 113792 bytes largest free block:
8-bit Accessible Memory Available: 239800 bytes total, 113792 bytes largest free block:
32-bit Memory Available: 325548 bytes total, 113792 bytes largest free block:
DMA Memory Available: 239800 bytes total, 113792 bytes largest free block:
SmartMatrix Layers Allocated from Heap:
Heap Memory Available: 287168 bytes total, 113792 bytes largest free block:
Starting SmartMatrix DMA Mallocs
sizeof framestruct: 0000C000
DMA Memory Available before ptr1 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[0] pointer: 3FFE4374
DMA Memory Available before ptr2 alloc: 101900 bytes total, 39960 bytes largest free block
matrixUpdateFrames[1] pointer: 3FFF0384
Frame Structs Allocated from Heap:
Heap Memory Available: 187648 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101900 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187648 bytes total, 85748 bytes largest free block
DMA Memory Available: 101900 bytes total, 39960 bytes largest free block
Allocating refresh buffer:
lsbMsbTransitionBit of 0 requires 49152 RAM, 39960 available, leaving -9192 free:
lsbMsbTransitionBit of 1 requires 24576 RAM, 39960 available, leaving 15384 free:
lsbMsbTransitionBit of 2 requires 12288 RAM, 39960 available, leaving 27672 free:
lsbMsbTransitionBit of 3 requires 6144 RAM, 39960 available, leaving 33816 free:
lsbMsbTransitionBit of 4 requires 3072 RAM, 39960 available, leaving 36888 free:
lsbMsbTransitionBit of 5 requires 1536 RAM, 39960 available, leaving 38424 free:
lsbMsbTransitionBit of 6 requires 768 RAM, 39960 available, leaving 39192 free:
lsbMsbTransitionBit of 7 requires 384 RAM, 39960 available, leaving 39576 free:
Raised lsbMsbTransitionBit to 7/7 to fit in RAM
lsbMsbTransitionBit of 7 gives 813 Hz refresh, 60 requested:
Raised lsbMsbTransitionBit to 7/7 to meet minimum refresh rate
Descriptors for lsbMsbTransitionBit 7/7 with 16 rows require 384 bytes of DMA RAM
SmartMatrix Mallocs Complete
Heap Memory Available: 187232 bytes total, 85748 bytes largest free block
8-bit Accessible Memory Available: 101484 bytes total, 39960 bytes largest free block
32-bit Memory Available: 187232 bytes total, 85748 bytes largest free block
DMA Memory Available: 101484 bytes total, 39960 bytes largest free block
Are you seeing the green trails and theyāre acceptable, or do they not show up? Can you try the MultipleTextLayers sketch and tell me if you see the same as the video I posted, or maybe share a recording of what you see?
I havenāt looked at Steveās code to see how itās using DMA linked lists. I imagine it could be made more efficient if youāre not seeing any decrease in RAM usage. Iām reluctant to spend any time on it until I see some workaround for the green trails issue.
@Louis there is a very small hint of green on panel #2, one pixel high.
See https://photos.app.goo.gl/v6giTzohkBGCA6JD6
Itās barely visible for me and only if I donāt fill the panel, which my patterns usually do.
Text is scrolling in the wrong direction too, but thatās always been true with your teensylc fork.
I used this
#define COLOR_DEPTH 24 // known working: 24, 48 - If the sketch uses type `rgb24` directly, COLOR_DEPTH must be 24
const uint8_t kMatrixWidth = 64; // known working: 32, 64, 96, 128
const uint8_t kMatrixHeight = 96; // known working: 16, 32, 48, 64
const uint8_t kRefreshDepth = 24; // known working: 24, 36, 48
const uint8_t kDmaBufferRows = 4; // known working: 2-4,
Thanks for that explanation, it does help.
As you said, what I lose is color depth (not actually so noticable) and brightness (much more noticeable).
ālsbMsbTransitionBit of 7ā is not a happy place to be at indeed, now Iām back to 3 due to my panel size (96x64) and refresh rate (>120Hz)
lsbMsbTransitionBit of 0 requires 49152 RAM, 39960 available, leaving -9192 free:
lsbMsbTransitionBit of 1 requires 24576 RAM, 39960 available, leaving 15384 free:
lsbMsbTransitionBit of 2 requires 12288 RAM, 39960 available, leaving 27672 free:
Raised lsbMsbTransitionBit to 2/7 to fit in RAM
lsbMsbTransitionBit of 2 gives 100 Hz refresh, 120 requested:
lsbMsbTransitionBit of 3 gives 191 Hz refresh, 120 requested: