The latency of 5s for the basic tier search request is very confusing to me. Is ...

pegasus · 2025-11-06T20:42:27 1762461747

This is a search agent available in the cloud. The site mentions that they doesn't optimize for being "done in milliseconds and as cheaply as possible", and that they do a lot more work like extracting relevant paragraphs and "Single-call resolution for complex queries that normally require multiple search hops" and more. Geared to be consumed by other agents, hence the latency may be tolerable. They have the advantage of running the agent code close to the index so less expensive searches. Basically, this is something in between a simple google search and a "deep research" or at least "thinking" LLM call.

paragagrawal · 2025-11-06T21:07:51 1762463271

In agentic use cases, we save on end-to-end latency by spending more time and compute on individual searches. This happens because agents do fewer searches, use fewer tokens, and end up using fewer thinking tokens when using the Parallel Search API.