The latency of 5s for the basic tier search request is very confusing to me. Is that 5s per request or 5s per 1k requests? If it is indeed 5s per request that seems like a deal breaker
This is a search agent available in the cloud. The site mentions that they doesn't optimize for being "done in milliseconds and as cheaply as possible", and that they do a lot more work like extracting relevant paragraphs and "Single-call resolution for complex queries that normally require multiple search hops" and more. Geared to be consumed by other agents, hence the latency may be tolerable. They have the advantage of running the agent code close to the index so less expensive searches. Basically, this is something in between a simple google search and a "deep research" or at least "thinking" LLM call.
In agentic use cases, we save on end-to-end latency by spending more time and compute on individual searches. This happens because agents do fewer searches, use fewer tokens, and end up using fewer thinking tokens when using the Parallel Search API.