Let me try to clear up some of these questions…
First, when we refer to “48 bit color”, this actually means 16 bitplanes. There are always 3 color planes per bitplane (RGB). That’s why you have to adjust the color depth in multiples of 3.
For SmartMatrix, we use BCM (binary code modulation) in addition to PWM on the OE (Output Enable) pin. That allows us to control the brightness of each bitplane very finely. Each bitplane gets an independent OE duty cycle.
When the driver is operating near its maximum frequency, the LSB bitplanes are data bandwidth limited and they take the same amount of time to output to the panel. Then the OE signal is modulated simultaneously so the apparent brightness is doubled for each successive bitplane. We are not dropping bits - generally we are able to achieve good bit accuracy up to 16 bitplanes. By default the PWM timer clock frequency is 75 MHz on Teensy 4 so there’s plenty of resolution.
Adjusting the OE duty cycle is also how we are able to achieve panel dimming without losing accuracy.
For a 128x64 display at 18.462 MHz clock (Teensy 4) the theoretical maximum rate to clock one bitplane to the whole panel is 4507 Hz as sutaburosu correctly calculated. With extra overhead for the latch pulse, the max available rate is closer to 3600 Hz. With 36 bit color depth, there are 12 bitplanes so the max frame rate is 12 times smaller = 300 Hz. (The 340 Hz number was from older code that used less overhead and a faster clock, but was less reliable.) In this scenario, the speed is so high that all 12 bitplanes are bandwidth limited and each takes the same time to display, so all of the brightness modulation comes from the OE PWM.
Max speed does come with a drawback, however: the maximum brightness will be much lower, since the OE modulation will be keeping the screen dark the majority of the time. You can get much better brightness by either reducing the frame rate or the number of bitplanes. I find that staying below about 80% of the max frame rate gives good results.
I should also note that sometimes the driver is CPU limited instead of bandwidth limited, so the maximum achievable frame rate can be lower.