If there's no hardware handshaking, have you considered software handshaking? Instead of firing a (e.g.) 20kByte buffer at your device at any given time, first write a small "write request" (which includes the length) on the UART, wait for a "write acknowledgement" from the device, then write the buffer. You could even do it chunk-wise with known chunk sizes and acknowledge each chunk (and resend if something was lost). Preferrably, the transfer on the receiver side from the UART peripheral into RAM should happen via an interrupt / ISR or be DMA accellerated. I think this would actually eliminate most of the problems you're seeing right now. How does it currently work? Does some application thread read the data from UART into RAM?