I was actually assuming the input representation would just be a video stream, w...

Arwill · on Aug 10, 2017

Using just the video feed, the AI would be required to reconstruct an overview of the strategic situation, and then develop a forward strategy on top of that involving individual units. Even for a much simpler game like doom, video-only input is enough for strategies like "see an enemy, target and shoot it as fast as possible".

For an AI to be able to effectively compete in a complex game like SC2, preparing high-level inputs is important. Look at these like shortcuts, heuristic approximations of task that would be hard to represent and train with deep learning. I would guess an implementation would need multiple independent nets for various tasks, combined with heuristics. Then each could be separately trained to do the given task.