It works really well, you can tell it to implement new features or mutate parts of the code and it having the entire (or a lot of) the code in its context really improves the output.
The biggest caveat: shit is expensive! A full 32k token request will run you like $2, if you do dialog back and forth you can rack up quite the bill quickly.
If it was 10x cheaper, I would use nothing else, having a large context window is that much of a game changer. As it stands, I _very_ carefully construct the prompt and move the conversation out of the 32k into the 8k model as fast as I can to save cost.
How it calculates the price? I thought that once you load the content (32k token request / 2$) it will remember the context so you can ask questions much cheaper.
It does not have memory outside the context window. If you want to have a back-and-forth with it about a document, that document must be provided in the context (along with your other relevant chat history) with every request.
This is why it's so easy to burn up lots of tokens very fast.
It works really well, you can tell it to implement new features or mutate parts of the code and it having the entire (or a lot of) the code in its context really improves the output.
The biggest caveat: shit is expensive! A full 32k token request will run you like $2, if you do dialog back and forth you can rack up quite the bill quickly. If it was 10x cheaper, I would use nothing else, having a large context window is that much of a game changer. As it stands, I _very_ carefully construct the prompt and move the conversation out of the 32k into the 8k model as fast as I can to save cost.