Ensembling is okay, but it doesn't scale great. Like you can put in a bit more compute to draw more samples and get a better result, but you can't keep doing it without hitting a plateau, because eventually the model's majority vote stabilizes. So it's a perfectly reasonable thing to do, as long as we temper our expectations.