At such a high volume of requests it probably makes sense to consider going one abstraction level lower by replacing HTTPS with plain SSL sockets based communication for further cost reduction.
I think using HTTPS is fine. But there is probably some value in using GRPC+proto by default instead of REST+json. With client-side streaming, you set up and tear down the connection less frequently, and that means you negotiate TLS and send initial headers less frequently. And the messages themselves are smaller, especially for small messages.
GRPC streaming is almost as efficient as just using a raw TCP stream, but saves you having to write the protocol glue code. There are already clients and servers that work, and you can just write your protocol definition in the form of a protocol buffer. Worth a look for this use case.
(Also, the clients know how to do load balancing, so you don't have to pay Amazon to do it for you. Unlike browsers, most language's GRPC clients are happy to take a list of IP addresses from DNS and only send requests to the healthy endpoints. Browsers, if you're lucky, try opening a TCP connection but will happy keep the same IP address even if it 503s on every request. Chrome, Firefox, and Safari all do different things.)
that is of course true, but they won't be able to ommit not working/failed/overloaded nodes whereas a load balancer might be able to do so. On the other hand the client might be programmed to just use another IP from the list and resend request if one node fails to answer, but this would increase the total time required by the client to do a successful connection.
I also realise that non-responsing nodes might be rare enough for this to be a negligible problem - just playing devils advocate here.
No you can do all that stuff with grpc. You can use active health checks (grpc.health.v1) to add or remove nodes from the pool. (You can configure the algorithm that is used to select a healthy channel for the next request, too.) You can also talk to a central load balancer, which provides your client with a list of endpoints it's allowed to talk to.
When you control the client, you don't have to resort to L3 hacks to distribute load. You can just tell the client which replicas are healthy. (And both ends can report back to give the central load balancer information on whether or not the supposed healthy endpoints actually are.)
L3 load balancing actually works somewhat poorly for HTTP/2 and gRPC anyway. They only balance TCP connections, but you really want to balance requests. That is why people have proxies like Envoy in the middle; the client isn't smart enough to be able to do that, but it is. But if you control the client, you can skip all that and do the right thing with very little resources.
Nice deep dive into the S of HTTPS anyway.