Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Using computer vision to detect birds in parks (flickr.net)
195 points by sawthat on Oct 20, 2014 | hide | past | favorite | 33 comments



xkcd is a pretty great programming language. You draw the feature that you want, and then the internets implements it somewhere, somehow.


But the difference is only one person can use it.


My professors used to joke about the "Grad Student" programming language all the time.


Speaking of CV + birds, I knew the guys behind a small company called Ornicept.

Basically, wind turbines kill birds. To protect birds, you have to collect data on birds in the area. The standard way of doing this was literally putting a guy in a lawn chair for a few hours each day counting birds, and then extrapolate the numbers.

They put some cameras in the field and run CV on the footage to count the birds instead. Cameras are cheaper than people in lawn chairs, apparently. I don't know if they still do this, but it was their initial product.




http://i.imgur.com/Svn2bZ4.jpg

Also, Matt Zeiler runs an image recognition startup which the image above was generated from. Check them out at http://clarifai.com/



Heh.. the peacock does a fine job protecting itself... even from computers.


Nice. They use deep convolutional neural nets, which is the key algorithm that dominated the ILSVRC 2014. This computer vision contest included, among other challenges, recognizing different bird (and dog and cat and spider and.... ) breeds. There is a blog post about ILSVRC that even uses the same xkcd comic ;) http://blog.a9t9.com/2014/10/amazing-progress-of-computer-vi...


I put a picture of some chicken breasts and it didn't recognize it, is it a feature?

Joke aside, incredible work here. Did you do everything from scratch or used some library to get faster results?


I'm seeing a lot of pictures of flowers and bees in the bird pictures. I'm guessing this is due to the way the training is done, anyone from Flickr who can comment on that?


Some pretty clear bird photos aren't working either; I guess it has to be a close up, which kind of ruins the usefulness for me. http://rideintobirdland.com/wp-content/uploads/2012/04/Littl...


Also, the exact question you're trying to answer makes a difference. If the question is, "Is this a picture of a bird?" where "of a bird" means that the primary focus of the picture is a single bird, then your picture should return "No". That's different than, "Are there one or more birds in this picture?"

I'm not sure which question Flickr's team actually seeks to answer here.


It's cool that a team of deep learning researchers can pull this off quickly. Anyone know of an "image2vec" (word2vec for images) that would empower others to try out similar things? (unfortunately it would need a better name, because "vectorize" means something different for images.)


Caffe (and other frameworks) provide exactly this. It's basically:

1) To setup, load a pre-trained AlexNet/Overfeat/other architecture model (e.g. trained on ILSVRC2012)

2) To get a vector from an image, run a forward pass on the images, and extract the activations at a given layer (e.g. fc7) as the output vector.

http://caffe.berkeleyvision.org/gathered/examples/feature_ex... is a step-by-step walkthrough.

There's a lot of mystique around deep learning and these kind of problems, but it's not _that_ difficult to use these tools.


I think you'd need something slightly more complex than a "word2vec" since images already have a well defined "word vector" i.e. a pixel. What you want is a "parser" that can take in an image and spit out the significant parts of it? Stanford might have the code up from this paper ( http://machinelearning.wustl.edu/mlpapers/paper_files/ICML20...) up on their site.


Do you just want the neural net, or the tags too? This might not be what you want, but it looks like it could be fun to play with: http://clarifai.com/


Does not recognize Big Bird. I proclaim this to be a failure.


Ha. I tried Big Bird too. Under "For Bird?", it said "It certainly wants to be."


Is this something that could be accomplished with the help of CCV?

https://github.com/liuliu/ccv


Totally not what I was expecting!

What would be very cool would be some kind of trigger app where you leave your phone in a tree for an hour, and it takes pictures of all the birds that come within range.

Detecting whether a given image has a bird in it, while certainly difficult and "interesting" from a CS point of view, is not very interesting from a user perspective? (Photographers can always tag their images when they submit them).


The advantage of doing it from a user perspective is that people in practice don't tag their images, but others want to find e.g. CC licensed pictures of blue jays in flight.


XKCD requests Park or Park with Bird.

Flickr built service as requested. Web page titled Park or Bird but works as Park or Park with Bird. Who named this thing?

Cool project regardless.


Does not work on tablet :(


looks like either a typo or a missed edit of a previous figure: http://imgur.com/FUQru1m


[deleted]


I'm not sure this refutes the main point - it does appear to be a team of research scientists, they have likely been working on it for year++ and (since upthread mentions it not id'ing Big Bird or a peacock) it's not finished yet.

Someone else mentioned in another thread that "check whether they're in a national park" is only easy now, thanks to 30 years of monumental effort (launching satellites etc). To bring this back to Hacker News, I wonder how long until vision processing will be in the same place? Where it can be taken for granted?


[dead]


You didn't click the link, did you?


LOL! Actually I did, but I was at the laptop at the time and didn't scroll so I missed it... :D


I had this xkcd in mind before i even clicked the topic / comment section..

But then i watched the link, guess what's on page 1...



[dead]


They actually reference this in the article as the source of inspiration. Its clear that neither of you read it.


>"Hey, yeah! I went to Rocky Mountain once!" //

Hate this jokey style of response. Is it stating that the image is from Rocky Mountain; Perhaps it's just a random comment-bot statement about national parks? Bleurgh.

Yes, I'm sure there's a demographic that enjoys inane comments written as if the presentation layer was conscious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: