I'm not sure when stty started accepting high values, but maybe I was working with an older version at the time. I just tried stty with high values, and it accepts them (I did firmware upgrade recently).
I think I can explain what is going on (even without reading the source code). The driver (or stty) is accepting the high values, but the high baud values (low divisor values) are inaccurate at these values as they do not appear to be using the high speed feature (I read the high speed register after setting uart to high value using stty- still set to 0 which uses x16).
If you calculate what the divisor latch should be for 921600, it is 2.7 I think, round up to integer of 3 and you get 833333 baud, which is close to what you are seeing. The opposite effect of what I was going through- you have an accurate micro, and the omega is the one which has a problem.
The solution is to set the high speed mode to 3 which uses the sample_count and sample_point registers. The x16 is then eliminated and you are dealing with divisor values with more precision.
In my case, 1000000 baud is the max I can do from my micro (the next value up for my micro is 1200000, which does not work). I don't know why it doesn't work faster, but my micro is running on internal osc w/usb corrections which should make it good, but still not crystal good. I don't know how accurate the omega clock is, but that is another error source. At these higher rates, the sample_count starts getting lower (0x28 for 1000000) and you start losing accuracy there, too.
I think my uart.sh script is mostly readable, so you can see what I'm doing to get 1000000 baud. If you cannot understand it, I can explain it a little better if needed.