A single BULK USB endpoint can only support 64 bytes per transfer. You could make use of multiple endpoints (8 are available in total) or switch to isochronous endpoints which can support up to 1023 bytes per frame.
Most hardware support for isochronous transfers requires DMA on the MCU side, so it tends to be a pain unless your vendor has a library that handles it for you.
You can in general send up to 19 bulk transfers in a single frame (even on a single endpoint), but again, vendor libraries differ wildly in their support for this.
Note that isochronous transfers require kernel-mode drivers on the host side, so you won't be able to use libusb in that case. Bulk transfers are the way to go if you want high throughput.
Isn't the restriction just on the reception side? So, if you have a MCU talking to a PC or another MCU, you can send however many bytes you want, but not receive? I say this because A: The Reference Manual only indicates this limit for reception, and B: I only experienced this on reception: PCs seem capable of sending and receiving messages larger than 64-bytes, and STM32s seem capable of sending messages larger than 64-bytes, but not receiving (without isosynchronous, or anything special)
The limits on packet size are based on the transfer type (control, interrupt, isochronous, or bulk) and whether the connection is low speed (1.5 Mbps), full speed (12 Mbps) or high speed (480 Mbps). The USB module on the MCU will be designed for the largest possible packet, which IIRC is a full-speed 1023-byte isochronous packet. (MCUs usually aren't fast enough to reach high speed.)
Data larger than one packet can be sent as a multi-packet "transfer". This is where bulk transfers get their throughput -- at full speed, the largest bulk packet is only 64 bytes, but you can send 19 bulk packets per 1-millisecond frame, which gives 1216 bytes/millisecond, more than the 1023 bytes/millisecond possible with isochronous.
You might be able to force the hardware to send nonstandard packets, but then it's not really USB any more.
Most hardware support for isochronous transfers requires DMA on the MCU side, so it tends to be a pain unless your vendor has a library that handles it for you.
You can in general send up to 19 bulk transfers in a single frame (even on a single endpoint), but again, vendor libraries differ wildly in their support for this.