This is so much cheaper than re-prompting each tool use.
I wish this was extended to things like: you could give the model an API endpoint that it can call to execute JS code, and the only requirement is that your API has to respond within 5 seconds (maybe less actually).
I wonder if this is what OpenAI is planning to do in the upcoming API update to support tools in o3.
I imagine there wouldn’t bd much of a cost to the provider on the API call there so much longer times may be possible. It’s not like this would hold up the LLM in any way, execution would get suspended while the call is made and the TPU/GPU will serve another request.
They need to keep KV cache to avoid prompt reprocessing, so they would need to move it to ram/nvme during longer api calls to use gpu for another request
This common feature requires the user of the API to implement the tool, in this case, the user is responsible to run the code the API outputs. The post you replied suggests that Gemini will run the code for the user behind the API call.
Could you elaborate? I thought function calling is a common feature among models from different providers