Not a bad guess but I'd be interested to actually know the details. Making it all by hand would be problematic for reasons beyond the sheer volume. How would one find a deep, diverse set of images of people pointing? How did the creators ensure they had images to cover all possible pixel positions within a certain proximity? It seems like it would take reviewing many more than 900 images to produce a final set of 900 that includes even coverage.
Another guess here, but it could have been crowdsourced using Amazon Mechanical Turk. Assuming a conservative 0.05$ by picture, the total cost would be 901 * 0.05 = 45.05$.
Yeah, this seems likely. Here's how I'd do it: create a webapp similar to this one to show to your Turkers. The difference would be that it would show an image at random, and the Turker would decide whether or not someone is pointing in the image, and if so they would click where the person was pointing. For each image, take the mean position of mouseclicks. This becomes a seed for the Voronoi diagram.
You can tell he/she "cheated" a bit for some the parts of the canvas that didn't get many seeds. For example, if you move your mouse within one of the large cells in the bottom left, the image just moves to keep up with the pointer. :P
You don't need a different picture for every single pixel: if the picture is large enough, you can easily crop it and use it to cover few thousands pixels.
They could have had help from friends. Also, the design of the application makes it very easy to add new images to the set (just throw a new entry into the JSON file containing the x and y coordinates of the finger, and the script does the rest whenever the user loads the page).