If anyone is interested, Sonic Visualizer with it's vamp plugins is pretty state of the art when it comes to analyzing music. The Chordino NNLS Chroma plugin, for instance, can extract even jazz chords fairly accurately
Interesting! I just signed up for Chordify [0], which does extraction of chords from YouTube videos, and then plays them while you play along to either a grid of chords, or an animation showing the current chord and the next ones coming up. Has chord diagrams for Guitar, Ukulele, and Piano.
Really nice, frequently accurate (it's not perfect :-).
Is there a market for "mathematically superior" tools similar to this? I've been developing this sort of math for more than half my life, and would love to "productize" it. I just never thought there'd be a market...
Go for it. Let your users make your market, you should just make a great tool/service.
If you're better than the competition you should be profitable, and if your maths is better you're better than the competition.
I would pay for it. More importantly I would use it. Not everything has to have a price as long as everything has a value.
It's funny because I was trying to sing a melody into Musescore just the other day and had to apply a bit of Audacity and then remember theory from secondary school to make it work. My middle-age voice sounds different to a computer than to my ears. If you could make this easy and accurate you would do every amateur song-writer on this earth a huge favor, though maybe not every amateur listener :)
Chordify is great, but for some reason the chords are displayed late when running along side the youtube video. A simple timed delay would improve this frustrating quirk, but it hasn't been fixed in years.m
But so far none of the vamp plugins seem to be showing anything interesting. (I have tried a random handful)
EDIT: The spectrum view is fascinating. I can see a little bit of additional information on the edges on the Flac vs a MP3. Is there a plugin that can separate the instruments?
All I can say is that I don't really look for sheet music anymore, or guitar tabs / chords. When I find interesting music on youtube I just use youtube-dl and Sonic Visualizer / NLS-chroma and jam along. It's upped my song writing abilities and general understanding and feel for music tremendously
For anyone else who had never heard of "cepstrum" before, this is what I found on Wikipedia:
"The cepstrum is the result of computing the inverse Fourier transform of the logarithm of the estimated signal spectrum. The method is a tool for investigating periodic structures in frequency spectra. The power cepstrum has applications in the analysis of human speech.
The term cepstrum was derived by reversing the first four letters of spectrum. Operations on cepstra are labelled quefrency analysis (or quefrency alanysis), liftering, or cepstral analysis."
wikipedia's sections on signal processing and data compression have jumped to near textbook quality in the past few years.
the cepstrum is awesome, it comes from the source-filter model of human speech. by looking at the periodicities in the frequencies, it attempts to capture the resonance of the filter that models the vocal tract.
>wikipedia's sections on signal processing and data compression have jumped to near textbook quality in the past few years.
I pretty much used Wikipedia as my main resource when I was learing dsp at college.
I get like 80% of my EE information from Wikipedia nowaydays, for a ton of different areas. A couple days ago I was reading Meindl's paper on boron implantation in MOSFETs and I don't recall exactly what it was, but it was such an obscure topic in the paper my colleages had quite the trouble and ultimately did not find more resources on it.
There was literally a whole Wikipedia section dedicated to the concept and it saved my goddamn ass in the presentation I had the next day.
I absolutely love Wikipedia and I owe a lot of my education to the contributors.
Cepstral analysis has a pretty good statistical basis (no pun intended) away from the source/filter model.
There's a bit of magic in the MFCC computation where you apply the discrete cosine transform (DCT). That's all about reducing correlation between components in the cepstrum and makes no sense unless you "get" the way the change of basis has high energy compaction (more information in fewer values).
However this has nothing to do with a physiological understanding of human speech.
Track beats using time series input
>>> y, sr = librosa.load(librosa.ex('choice'), duration=10)
>>> tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
>>> tempo
135.99917763157896
No doubt because the author doesn't want it to be used in proprietary software that doesn't respect the user's freedoms. At least, not without taking a cut.
The LGPL was never meant for all libraries, only certain specific ones (which the FSF thought were better to more permissively license because of practical reasons.) That's one reason they renamed it from "Library GPL" to "Lesser GPL"
I've used its python bindings. I've been quite pleased with how well it works, honestly. I wrote some software for a client with multiple radio stations. It's a service that listens to one web stream per station and shoots the client an email/text if it detects problematic audio (e.g. static, silence, etc.). Some of the stations are talk and others are music, so it needed to be robust. Aubio made it really easy to test different detection algorithms, and its documentation was tremendously helpful. IRC room was active as well when I had questions.
aubio is the one python library I found that provided a ready-to-use pitch model, which I used for a simple script to listen for my doorbell. This worked much better than the more common approaches I saw elsewhere on the internet that did simple frequency detection.