It's probably a smaller, updated (distilled?) version of gpt-4 model given the price decrease, speed increase, and turbo name. Why wouldn't you expect it to be slightly worse? We saw the same thing with 3-davinci and 3.5-turbo.
I'm not going off pure feelings either. I have benchmarks in place comparing pipeline outputs to ground truth. But like I said, it's comparable enough to 4, at a much lower price, making it a great model.
Edit: After the outage, the outputs are better wtf. Nvm it has some variance even at temp = 0. I should use a fixed seed.