Sports have so much structured data, and such a high bar for describing it accur...

Sports have so much structured data, and such a high bar for describing it accurately (especially for a brand like ESPN), that there are significant risks to the hallucinations that might develop from a multi-hour transcript being fed into an LLM, especially with commentators excited about potential goals and other events that don't end up happening.

On the other hand, the rather simple task of "here's a set of goals, their times, who made them, who assisted... turn that into prose" could even be done without LLMs with a deterministic algorithm, and may very well have been in this case. Some of the grammar issues in the OP feel very pre-LLM in nature, like a combination of substitution rules gone awry.

Now, could you create a system that repeatedly interrogates the statements made by a first pass of an LLM on summarizing a long transcript, and comparing those results against structured data you know for accuracy? Would this lead to richer content and accessible error rates relative to the simpler approach? Would this be the type of thing that the best machine learning engineers in the world could probably prototype over a hackathon? The answer is very possibly yes to all three of these. But it's far from low-hanging fruit for any sizable, risk-averse organization. It's very difficult to fight against "the thing we have is imperfect, but at least it never gets the facts wrong."