Not to be dismissive, but as far as I can tell, the heavy work is done by facebook's demucs and this is an electron front end to run the demucs cli (and I guess search youtube for videos to download). The demucs project page has more information.
I'm curious, what is the business justification for funding development of Demucs, if you don't mind me asking? It doesn't seem very related to FB's core business.
Solving problems like audio source separation (eg. Distinguishing multiple speakers in a noisy environment, or picking speech out of a background where music is playing) seems very much in FB's wheelhouse.
The goal of Meta AI Research is to do open research, not necessarily with direct applications at the time we start it. Indeed, the architecture, or the lessons learnt working on it can become useful later for the company, for instance for remote presence with VR, to isolate the main speaker from its environnement ( https://arxiv.org/pdf/2206.15423.pdf ).
Just a guess here, but I wouldn't be surprised if it's used to better spy on your messenger audio conversations. They already listen in and will pick up keywords to populate your FB ad stream.
If I can reconstruct your conversation (through other meta information), without listening to sounds of your voices, have I not listened to your conversation?
Hi, I just downloaded demucs yesterday and started using it. It's amazing! I really appreciate all the work you put into making it easy to install and understand.
Is there any chance you can disentangle guitar and keyboard? I work a lot with Grateful Dead music and I'd like to be able to pull jerry's guitar out from the keyboard from live shows. Similarly, it would be cool if you could parse shpongle into its consituent tracks, but I think that's probably impossible.
There’s no need to be dismissive since they say this in the first sentence. Preparing an easy to use app for all platforms probably does get this into more creative hands, and that’s a net-positive contribution I can appreciate.
It does seem rather disingenuous that the product page makes no mention that the author didn't do the heavy lifting, and that at the same time it features a prominent donation button.
If I didn't know who did the real work and benefitted a lot from this tool, I'd give to StemRoller in proportion to my gratitude -- which I'm sure others are liable to do.
It's in the first paragraph of the README in the github repo and th3 second paragraph on the website. I'm not sure what more can be asked of the author.
Thanks for clarifying. I did think saying "Thanks" to the person who corrected me is a straightforward admission that I was wrong... if not - yes, I was wrong.
To the downvoters above: is a "thanks" to a correcting comment not enough on HN?
Also, was it rude to say hiding content "below the fold" is a design problem?
This is a very odd thread to me - like I was being chased down by pedants who are calling me pedantic.
Tried splitting a complex arrangement (Chicago by Sufjan Stevens). Drums bass and vocals come out fairly well, though the drums stem seems to lack other percussion elements outside of the core rock drumkit (e.g. tamborine), and cymbals hits are clipped rather than ringing. The 'other' stem, the rest of the instrumentation, keeps a fair bit of the percussion and there's bleed from the vocal melody.
The backing vocals seem to have disappeared for the most part, and are only audible in the vocals stem when the lead vocal is present (like they're reverse-ducked? Been a while since I did any production, the terms have escaped me...).
Not much use with complex arrangements to be honest, I was hoping to get things like the strings section separated from the rest of the arrangement.
YouTube audio is optimized for bit rate, not quality (128K MP3). You will get better results with a higher-bitrate MP3 (320K would be good), better still with an uncompressed format like FLAC or WAV.
Makes sense. MP3 tries to compress without loosing information in the hearable spectrum of a human but that information can still be processed by algorithms.
I'd like to take a moment to mention how great dropbox's audio seeking thing is. It's super fast and works as intended. Great work whoever implemented this.
Just a follow up. My two conversions so far, Lalal.ai has been better. Especially separating drums from instruments. I'll give Stemroller a few more tries as I am always looking for options.
Update number three. I now just use both lalal and stemroller because each one seems to do better in certain cases. If I hadn’t paid for lalal, I’d probably just use stemroller as it’s way better than RX9
This seems to run just fine under Linux as well, not completely out of the box though: It's basically missing builds and config for Linux which can be build analogous to the existing Win/Mac stuff.
It's almost eerie how well this works with electronic music. Coming from an age where your best try to separate a track was using equalizers, I didn't have high hopes.
Trying it out with Alan Walker's Alone, it separates the vocals and drums almost perfectly. Bass is really fine as well, only instrumental and 'other' was a bit mixed up in my try.
"Download and extract the latest ffmpeg snapshot from evermeet.cx and place the ffmpeg executable inside"
Why? Why can't this just point to the location where ffmpeg is rather than making a copy of ffmpeg? symlink might work, but just do a $(which ffmpeg) or ask the user for the path ~/bin/ffmpeg /usr/local/bin/ffmpeg etc
ffmpeg has not had a stable command line interface for some time. It can be a problem to assume that the system-installed version accepts the arguments you plan to give it.
Open Culture recently posted a link to Abbey Road but with only Paul's bass lines, but the actual content got taken down. [0] It was really cool though, in part because it's not nearly as precise as I would have thought, which made it feel really organic.
I imagine studio-era Beatles in particular would be difficult.
Microphone bleed, lots of overdubs (especially vocals), and repeated re-layering tracks on tape over and over due to channel limitations. They really were doing crazy stuff with limited tech.
I think this would be hard for bands that really fill the spectrum and don’t have that clean treble, mid, bass separation. Or recordings really compressed into a frequency range.
Now this makes me want to see what happens with like My Bloody Valentine and Husker Du :).
Been using demucs for a couple of weeks now, mostly taking my early produced music which I have since lost the project files for and giving them a remix and update. Gotta say I have been blown away by how good demucs is. I installed it following the repo instructions and then created a zsh alias to run it with any file name. Eg $ai_split mySong.mp3
Wait fifteen minutes and out pops four stems, flawless so far, even been messing around with mainstream tracks and using ableton with warp applied to quickly build out remixes. Demucs is going to be /is already a game changer!
This testimonial almost has me wanting to try it on an “album”[1] I recorded when I was in a “band”[2] in high school. I too lost all of the source files[3].
1: On second thought maybe not. It has not aged well.
2: Me and another kid, with a guitar, a pre-OS X Mac, a pirated copy of Rebirth, a pirated copy of SoundEdit 16, and literally the mic that Apple used to include with (some?) Macs. I’d back-reference[1], but our equipment was not the problem. Well, except for [3].
3: I learned my lesson: I should have been older and had a job that would afford me a backup drive, so I could sample the sounds of that dying HDD and retcon the samples into my “album”[1].
That’s awesome! I wonder if there are projects to create a repository of pre-split public domain music? Seems like something the internet archive could host once created.
With a tool like this, you could get back into the animutation scene.
(Edit: I guess it's a bit of a non-sequitur, but I enjoyed Suzukisan, so there's that.)
I just did a quick test of demucs vs spleeter:4stems. demucs is significantly slower but the output is better.
in a semi blind comparison, I prefer demucs for all 4 tracks (drum, bass, vocals, and other). bass and other stand out the most so let me say a couple words about them.
bass - the demucs bass has less bleed from other instruments and the volume is consistent throughout. with spleeter, the volume varies a lot and there are multiple sections of 1-2 bars where it just drops out completely. In Capo, the demucs spectrogram is nice and clear whereas spleeter tends to look like pencil smudges for the most part.
other - with spleeter, whenever there are vocals, the other instruments turn to mush. demucs is much better. Oh, you can tell people are singing -- the instruments get muffled -- but you can still hear them.
I dabble in audio production in my free time outside of work, and I typically will use iZotope RX 9 or Neural Mix Pro for isolating vocals or stems. However, these are paid products, and it's encouraging to see more open source projects being built around this space.
I like the opportunity to view the source code and learn from it, as opposed to most paid products which are typically closed-source and a bit of a "black box".
Sure - this is mostly just an accessible frontend for Demucs, but that's still okay. The author clearly indicates that in his repo, giving credit where credit is due. Additionally, this helps less-technical creators be creative in new ways.
Honestly, this sort of thing is cool; but why (in general) is it necessary in the first place?
If the elements of the song are recording in isolation - which they are in all studio versions, why can't we just move to a format that supports the layering?
Musicians and studios don't generally tend to offer the public access to original stems for songs (why would they?)
Say that you want to make a remix, mashup, or otherwise use sound-bytes from a song. The easiest thing to do is use a tool like Spleeter/Demucs to separate the source layers so that you can then further process them in your DAW.
This is what I do, but I just use the Demucs CLI because it's simple enough.
Many niches in electronic music have small knit communities of creators and producers that regularly remixes each-others stuff. But it is not an open community, you gotta have a decent standing (from making own music or prior remixes) before someone is willing to send you their stems. For anyone musician that has a label/publisher, they also need to be in the loop, for handling of the royalties.
So sharing stems happen regularly in the music industry, but it is not easily accessible. Which makes tools like the one mentioned very useful for everyone else that would like to participate.
It isn't really in the best interest of the artist to provide this. The final mix is part of the overall product / work of art. Providing all of the individual tracks (there could be 30 or more in total) would also take up a lot of space / increase processing requirements while benefiting very few.
I usually use this kind of tools to get the bass score of some songs, for example. With the isolated elements it is much easier to know exactly what notes are sounding (I don't have a good ear). The same for drums or synth notes.
As after all the sound quality doesn't interest me too much to do this, I usually use iZotope RX, but I will try this tool.
For all who look for something like this, iZotope RX (the audio retouche software) has a function called "Musical Rebalance" which is great for reducing spill or changing the balance in a live recording.
I've always wanted a way to extract just the kick drums in realtime but I don't understand this field well enough to understand whether it would be remotely possible or not.
You want just the beat, ie the time markers of each kick? Or you want the isolated sound (ie audio) of each kick?
Both are generally possible today, though the approach will differ a little bit.
Demucs did a much better job of isolating the bass on a blues track than LALAL. The bass actually sounded like a bass. LALAL got the note pitches but lost their attacks.
https://github.com/facebookresearch/demucs