Assuming this is GlasgowGPT (based on the timing of that bot taking off on /r/glasgow and this post, and your username), a challenge you'd likely have with advertising is that the content from the bot is, whilst funny, not very SFW.
One option to get the costs down would be to use a local LLM. It isn't likely to be as fast/good as GPT-3.5 but at least would give you a fixed cost per month (hosting).
If you had local hardware, given that traffic is going to be pretty low, if you have decent bandwidth at home and some hardware, you could even host there.
> One option to get the costs down would be to use a local LLM
It's not that cheaper if you want similar level quality and inference speed. For commercial usage, OP will require training his own model since no good commercial alternative exists.
Not sure why you think local LLM would be fixed cost. Those things are very GPU heavy and you need to run multiple instances to serve multiple concurrent users, scaling up and down according to traffic.
Fixed cost in that if you host on a machine you know how much you'll pay a month. Obviously capping it out based on usage.
As opposed to API based where you pay more per use.
Remember this isn't a commercial service that OP is creating, it's an amusing Chatbot, they have no revenue, so fixed costs are likely to be a better model.
Local LLM won't give you "fixed costs" any more than using an API. If they want to serve the users of their amusing chatbot, they will have to scale up their local LLM according to scale, and they won't have "fixed costs" anymore. If they don't want to scale up and they're fine denying service to users at peak hours, they can do that with an API too. There's no law that says if you use an API then you must allow unlimited use. Of course they can also limit the use of the API.
If you can find an advertiser who likes that. You will have a much smaller pool of potential partners for that kind of content, and I'd guess many of the mainstream ad services just won't cater to it.
One option to get the costs down would be to use a local LLM. It isn't likely to be as fast/good as GPT-3.5 but at least would give you a fixed cost per month (hosting).
If you had local hardware, given that traffic is going to be pretty low, if you have decent bandwidth at home and some hardware, you could even host there.