Actually, there is good reason to be within 30ms ping (99th percentile). Half that, plus the 5ms algorithmic delay from Opus (in CELT-only-restricted-low-latency mode) gives 20ms, which is the lower end of uncanny valley for real-time interactive audio (certainly for musicians in a band, but I'll presume relevance for verbal communication to set in at the same psychoacoustic threshold).
If you introduce any amount of latency by executing the encoder/decoder pair, you'll have to subtract double the latency from your ping-allowance.
If you try to have correctly-lipsynced audio in a video call, I only know of one setup to offer similarly-low video latency: a rolling-shutter in the camera, a line-by-line display (CRT should do well), and up-to a few lines algorithmic delay for e.g. running non-buffering JPEG (8x8 DCT and an online entropy coder (no pre-analysis for optimal Huffman tables or such) to save like 80-90% bandwidth). Analog TV camera+screen hardware should also work, but it's really inefficient and not easy to emulate with digital hardware.
If you introduce any amount of latency by executing the encoder/decoder pair, you'll have to subtract double the latency from your ping-allowance.
If you try to have correctly-lipsynced audio in a video call, I only know of one setup to offer similarly-low video latency: a rolling-shutter in the camera, a line-by-line display (CRT should do well), and up-to a few lines algorithmic delay for e.g. running non-buffering JPEG (8x8 DCT and an online entropy coder (no pre-analysis for optimal Huffman tables or such) to save like 80-90% bandwidth). Analog TV camera+screen hardware should also work, but it's really inefficient and not easy to emulate with digital hardware.