Hacker News new | past | comments | ask | show | jobs | submit login

I'm not too familiar with ML, do you have to "train" tensorflow as to what audio in the data is "definitely a chainsaw" and which is "definitely a logging truck", etc etc? As in, you should keep doing this every now and then as you get more and more data, at least in the sense of flagging false positives (so as to unlearn that particular sound).



Yes. In the video there is a short snippet where the audio is shown as a Fourier transformed image on the screen and a user is annotating the image of the sound using red boxes. This is a part of the process to train the ML model to recognize chainsaw sounds vs. other sounds.


Thanks for noticing the spectral analysis. We put quite a bit of work into the training system. Besides the base-level Fourier transformed images, we also have a UI for partners who can easily report if an alert was correct or not which also feeds back into the system.


I also work with non-speech audio and I'm curious: Do you use pure DFT:s as inputs to your models or do you use mel-energies or MFCC:s? What kind of models do you use? Since there is not that much variation in the sound of a chainsaw I suppose either a regular fully connected or convolutional neural network?

Love what you are doing and I would love to see a technical blog post about how you work with audio!


But can it identify lyrebird's?


That's a great question. Actually, one of the sounds that are pretty close to a chainsaw are mosquitos that are circling around our microphones due to the Doppler effect. We found ways of dealing with signals that are close to chainsaws by aggregating multiple models and also a time-based analysis. The system can draw causal/correlative conclusions such as a vehicle is usually present before a chainsaw. If there's no vehicle, the likelihood of a chainsaw goes down and the chainsaw model must be highly confident before we sound an alert.


How do you quantify the confidence of your model? Do you use a Bayesian model or just the log-likelihood? Because the latter can act strangely in some cases.


I know this is a digression from the current discussion on how well the devices work, but as a stats student who just learned about estimating using log-likelihoods, could you give some more info on how that is inferior to the Bayesian model (since I've heard the exact opposite is true)?


The problem is that neural networks trained using maximum LL do not return calibrated probabilities, using e.g. the softmax output as 'confidence' of a model tends to result in overconfident predictions, take a look at adversarial attacks on neural networks for an extreme example: https://blog.openai.com/adversarial-example-research/


Logger-likelyhood ;)


If a lyrebird is mimicking a chainsaw or truck, wouldn't that indicate the presence of those chainsaws and trucks?


I had to look this up because I though you were jesting. Turns out Lyrebirds can mimic nearly any complex sound: https://youtu.be/VjE0Kdfos4Y


I imagine it would depend on the lifespan, and travel patterns of a lyrebird, no? Eg, bird is making the chainsaw noise weeks or years after the loggers have gone, possibly in an entirely different area.


lyrebirds are native to Australia


The swallow may fly south with the sun or the house martin or the plumber may seek warmer climes in winter yet these are not strangers to our land.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: