(1) Why would they train on the existing categorizations? You train on a smaller, known-good dataset.
(2) Outlier detection is a thing, even if they included the bad data. It would be trivial to detect that a small percentage of things categorized as "musical instruments" have extremely dissimilar facets/descriptions/images to all other things in that category, and very similar to things in a completely unrelated category.