Images to Text – Toronto Deep Learning Demos

JacobEdelman · on Dec 3, 2014

Looks amazing. The fact that its just returns the "Cannot connect to server of image2text models" makes me very sad.

YoukaiCountry · on Dec 3, 2014

So far I keep getting the error "Cannot connect to server of image2text models"

Anyone having any luck?

bootynuke · on Dec 3, 2014

I think it must be getting slammed; I was able to get a couple of descriptions out of it, but that was balanced by probably 2 times as many instances of the above error.

finin · on Dec 4, 2014

http://www.skunkieacres.com/images/rabbit_box.jpg

A picture of a rabbit in a wooden box => "a cat looking into a bin full of apples"

Mistaking a rabbit for a cat is not too bad. A bin is like a box, I suppose. I'm not sure where the apples came from.

thomasahle · on Dec 4, 2014

Perhaps it's been trained with pictures of apples in boxes...

tly_alex · on Dec 4, 2014

Rekognition API released similar image to text API and it's much more reliable than this. At least the demo works smooth and response fast. https://rekognition.com/demo/concept

teraflop · on Dec 4, 2014

Even leaving aside the reliability issue (which can be chalked up to the fact that this one is a demo of a non-commercial project that got overloaded), you're comparing two entirely different things.

Check out the "static demo" pages, e.g. http://www.cs.toronto.edu/~nitish/nips2014demo/results/79133...

For this image, the University of Toronto software generates sentences like "a cow is standing in the grass by a car", whereas Rekognition only produces a ranked list of categories. ("sports_car", "car_wheel", etc.)

EDIT: this is an even better example: http://www.cs.toronto.edu/~nitish/nips2014demo/results/89407... I'm cherry-picking the cases where the algorithm does well, of course. But even if it's unreliable, the fact that this works at all is impressive.

modeless · on Dec 4, 2014

The errors are fascinating. "a cow and a car are looking at the camera." "a band plays a group of music [...]". You could almost call them metaphors instead of errors.

vonnik · on Dec 4, 2014

what a lovely way of thinking about it.

CardinalAgnelo · on Dec 4, 2014

The demo is clearly designed for the small community of machine learning researchers to play around with it to better evaluate the papers they wrote. They aren't selling a product and probably have a hard time justifying using a lot of computing resources to host the demo. Furthermore, the models are probably optimized for result quality, not speed.

CardinalAgnelo · on Dec 3, 2014

Doesn't look to be designed for a lot of traffic, be gentle.

tonydiv · on Dec 4, 2014

We are using this research to help people learn languages in VR.

Take a look here: http://learnimmersive.com

misiti3780 · on Dec 3, 2014

Very cool:

Comment: If you click on source code right now it gives me to javascript alerts that were trying to print out JSON objects.

vonnik · on Dec 4, 2014

I'm curious to hear how much this is read as a sign of strong AI.

cmyr · on Dec 4, 2014

My brief survey suggests that their training sample did not include very much hardcore pornography.

"a man and a girl are learning to play with a small pool", while poetic, is a stretch in this case.

JacobEdelman · on Dec 4, 2014

Already after 1 hour of this being posted on hn... Reminders abound of how evolution only made us good tool makers to help us to reproduce more.

dzordzduan · on Dec 4, 2014

This is why I love hn.