Hacker News new | past | comments | ask | show | jobs | submit login
Images to Text – Toronto Deep Learning Demos (toronto.edu)
75 points by benanne on Dec 3, 2014 | hide | past | favorite | 17 comments



Looks amazing. The fact that its just returns the "Cannot connect to server of image2text models" makes me very sad.


So far I keep getting the error "Cannot connect to server of image2text models"

Anyone having any luck?


I think it must be getting slammed; I was able to get a couple of descriptions out of it, but that was balanced by probably 2 times as many instances of the above error.


http://www.skunkieacres.com/images/rabbit_box.jpg

A picture of a rabbit in a wooden box => "a cat looking into a bin full of apples"

Mistaking a rabbit for a cat is not too bad. A bin is like a box, I suppose. I'm not sure where the apples came from.


Perhaps it's been trained with pictures of apples in boxes...


Rekognition API released similar image to text API and it's much more reliable than this. At least the demo works smooth and response fast. https://rekognition.com/demo/concept


Even leaving aside the reliability issue (which can be chalked up to the fact that this one is a demo of a non-commercial project that got overloaded), you're comparing two entirely different things.

Check out the "static demo" pages, e.g. http://www.cs.toronto.edu/~nitish/nips2014demo/results/79133...

For this image, the University of Toronto software generates sentences like "a cow is standing in the grass by a car", whereas Rekognition only produces a ranked list of categories. ("sports_car", "car_wheel", etc.)

EDIT: this is an even better example: http://www.cs.toronto.edu/~nitish/nips2014demo/results/89407... I'm cherry-picking the cases where the algorithm does well, of course. But even if it's unreliable, the fact that this works at all is impressive.


The errors are fascinating. "a cow and a car are looking at the camera." "a band plays a group of music [...]". You could almost call them metaphors instead of errors.


what a lovely way of thinking about it.


The demo is clearly designed for the small community of machine learning researchers to play around with it to better evaluate the papers they wrote. They aren't selling a product and probably have a hard time justifying using a lot of computing resources to host the demo. Furthermore, the models are probably optimized for result quality, not speed.


Doesn't look to be designed for a lot of traffic, be gentle.


We are using this research to help people learn languages in VR.

Take a look here: http://learnimmersive.com


Very cool:

Comment: If you click on source code right now it gives me to javascript alerts that were trying to print out JSON objects.


I'm curious to hear how much this is read as a sign of strong AI.


My brief survey suggests that their training sample did not include very much hardcore pornography.

"a man and a girl are learning to play with a small pool", while poetic, is a stretch in this case.


Already after 1 hour of this being posted on hn... Reminders abound of how evolution only made us good tool makers to help us to reproduce more.


This is why I love hn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: