yeah sometimes there are definitely artifacts. technically they can be removed pretty easily with another model (like denoiser from FB) but for now we wanted to keep it simple to learn to control these things better through prompt engineering. Like when using a high quality input prompt it generally continues with high quality
At least in the last example, with the man and woman and the expensive oat milk, the background noise seemed to fit a likely public conversation scenario. I wasn't sure if it was accidental or not.