If you haven’t already: Start to store question and answer pairs and reuse the answer if the same question is asked multiple times.
You could also compute embeddings for the questions (don’t have to be OpenAI embeddings), and reuse the answer if the question is sufficiently similar to a prevously asked question.
I'm not sure it's practical and if it will result in any savings.
Wouldn't it be almost impossible to hit a duplicate when the users each form their own question?
Another issue I see is that these chat AIs usually have "history", so the question might be the same, but the context is different: the app might have received "when was he born", but in one context, the user talks about Obama and in another, she talks about Tom Brady.
If there are ways around these issues, I'd love to hear it, but it sounds like this will just increase costs via cache hardware costs and any dedup logic instead of saving money.
You could also compute embeddings for the questions (don’t have to be OpenAI embeddings), and reuse the answer if the question is sufficiently similar to a prevously asked question.