I'm not too familiar with ML, do you have to "train" tensorflow as to what audio...

osipov · on March 23, 2018

Yes. In the video there is a short snippet where the audio is shown as a Fourier transformed image on the screen and a user is annotating the image of the sound using red boxes. This is a part of the process to train the ML model to recognize chainsaw sounds vs. other sounds.

stefanRfcx · on March 23, 2018

Thanks for noticing the spectral analysis. We put quite a bit of work into the training system. Besides the base-level Fourier transformed images, we also have a UI for partners who can easily report if an alert was correct or not which also feeds back into the system.

colinnordin · on March 25, 2018

I also work with non-speech audio and I'm curious: Do you use pure DFT:s as inputs to your models or do you use mel-energies or MFCC:s? What kind of models do you use? Since there is not that much variation in the sound of a chainsaw I suppose either a regular fully connected or convolutional neural network?

Love what you are doing and I would love to see a technical blog post about how you work with audio!

starchand · on March 23, 2018

But can it identify lyrebird's?

stefanRfcx · on March 23, 2018

That's a great question. Actually, one of the sounds that are pretty close to a chainsaw are mosquitos that are circling around our microphones due to the Doppler effect. We found ways of dealing with signals that are close to chainsaws by aggregating multiple models and also a time-based analysis. The system can draw causal/correlative conclusions such as a vehicle is usually present before a chainsaw. If there's no vehicle, the likelihood of a chainsaw goes down and the chainsaw model must be highly confident before we sound an alert.

tonic_section · on March 23, 2018

How do you quantify the confidence of your model? Do you use a Bayesian model or just the log-likelihood? Because the latter can act strangely in some cases.

coolio2657 · on March 23, 2018

I know this is a digression from the current discussion on how well the devices work, but as a stats student who just learned about estimating using log-likelihoods, could you give some more info on how that is inferior to the Bayesian model (since I've heard the exact opposite is true)?

tonic_section · on March 23, 2018

The problem is that neural networks trained using maximum LL do not return calibrated probabilities, using e.g. the softmax output as 'confidence' of a model tends to result in overconfident predictions, take a look at adversarial attacks on neural networks for an extreme example: https://blog.openai.com/adversarial-example-research/

ada1981 · on March 23, 2018

Logger-likelyhood ;)

mongoosled · on March 23, 2018

If a lyrebird is mimicking a chainsaw or truck, wouldn't that indicate the presence of those chainsaws and trucks?

arbie · on March 23, 2018

I had to look this up because I though you were jesting. Turns out Lyrebirds can mimic nearly any complex sound: https://youtu.be/VjE0Kdfos4Y

notheguyouthink · on March 23, 2018

I imagine it would depend on the lifespan, and travel patterns of a lyrebird, no? Eg, bird is making the chainsaw noise weeks or years after the loggers have gone, possibly in an entirely different area.

osipov · on March 23, 2018

lyrebirds are native to Australia

emaginniss · on March 23, 2018

The swallow may fly south with the sun or the house martin or the plumber may seek warmer climes in winter yet these are not strangers to our land.