Hacker News new | past | comments | ask | show | jobs | submit login
StemRoller – Isolate vocals, drums, bass, and other stems from any song (github.com/stemrollerapp)
391 points by nikolay on Aug 5, 2022 | hide | past | favorite | 97 comments



Not to be dismissive, but as far as I can tell, the heavy work is done by facebook's demucs and this is an electron front end to run the demucs cli (and I guess search youtube for videos to download). The demucs project page has more information.

https://github.com/facebookresearch/demucs


Original Demucs author here. Thanks for putting forward our research!

I’m definitely happy to see more front ends for Demucs being developed and to read that it has been useful to other musicians!

We are working on the next iteration of the model, and with more sources, hopefully released by the end of the year :)

If you are interested in this research you can follow my Twitter (@honualx) or star the Demucs repo.


I'm curious, what is the business justification for funding development of Demucs, if you don't mind me asking? It doesn't seem very related to FB's core business.


Solving problems like audio source separation (eg. Distinguishing multiple speakers in a noisy environment, or picking speech out of a background where music is playing) seems very much in FB's wheelhouse.


The goal of Meta AI Research is to do open research, not necessarily with direct applications at the time we start it. Indeed, the architecture, or the lessons learnt working on it can become useful later for the company, for instance for remote presence with VR, to isolate the main speaker from its environnement ( https://arxiv.org/pdf/2206.15423.pdf ).


Just a guess here, but I wouldn't be surprised if it's used to better spy on your messenger audio conversations. They already listen in and will pick up keywords to populate your FB ad stream.


That’s absolutely not true. Facebook does not listen to your conversation: https://twitter.com/jspujji/status/1474797770871615497?s=20&...


If I can reconstruct your conversation (through other meta information), without listening to sounds of your voices, have I not listened to your conversation?


Hi, I just downloaded demucs yesterday and started using it. It's amazing! I really appreciate all the work you put into making it easy to install and understand.

Is there any chance you can disentangle guitar and keyboard? I work a lot with Grateful Dead music and I'd like to be able to pull jerry's guitar out from the keyboard from live shows. Similarly, it would be cool if you could parse shpongle into its consituent tracks, but I think that's probably impossible.


Is there something similar for separating different voices from spoken audio?


Yes there are, you can have a look at https://github.com/etzinis/sudo_rm_rf for instance for 2 speakers separation. There is also this one for 3 speakers: https://huggingface.co/speechbrain/sepformer-whamr


There’s no need to be dismissive since they say this in the first sentence. Preparing an easy to use app for all platforms probably does get this into more creative hands, and that’s a net-positive contribution I can appreciate.


They did not prepare it for all platforms though. Linux is missing.



It should be in the title.


It does seem rather disingenuous that the product page makes no mention that the author didn't do the heavy lifting, and that at the same time it features a prominent donation button.

If I didn't know who did the real work and benefitted a lot from this tool, I'd give to StemRoller in proportion to my gratitude -- which I'm sure others are liable to do.


It's in the first paragraph of the README in the github repo and th3 second paragraph on the website. I'm not sure what more can be asked of the author.


Thanks - I didn't even notice there was content below the fold on the web site (I'm on a desktop browser).

How about saying so above the fold?


How about you stop being so pedantic and admit it when you're wrong instead of digging a deeper, dumber hole.


This is a real design problem - many web sites do this.

Saying so and suggesting an alternative isn't being pedantic.


> It does seem rather disingenuous that the product page makes no mention that the author didn't do the heavy lifting,

It does seem rather disingenuous that your replies make no mention of admission for making a factually incorrect statement in the first reply.

Not a big deal to me personally, but it is not surprising that some people see this as being petty.


Thanks for clarifying. I did think saying "Thanks" to the person who corrected me is a straightforward admission that I was wrong... if not - yes, I was wrong.

To the downvoters above: is a "thanks" to a correcting comment not enough on HN?

Also, was it rude to say hiding content "below the fold" is a design problem?

This is a very odd thread to me - like I was being chased down by pedants who are calling me pedantic.


Seems you think everyone besides yourself is always the issue...maybe self reflect a bit


Tried splitting a complex arrangement (Chicago by Sufjan Stevens). Drums bass and vocals come out fairly well, though the drums stem seems to lack other percussion elements outside of the core rock drumkit (e.g. tamborine), and cymbals hits are clipped rather than ringing. The 'other' stem, the rest of the instrumentation, keeps a fair bit of the percussion and there's bleed from the vocal melody.

The backing vocals seem to have disappeared for the most part, and are only audible in the vocals stem when the lead vocal is present (like they're reverse-ducked? Been a while since I did any production, the terms have escaped me...).

Not much use with complex arrangements to be honest, I was hoping to get things like the strings section separated from the rest of the arrangement.

Original: https://www.youtube.com/watch?v=tWX3El-slpY

Output: https://file.io/etpOQt57ziKe


Did you use a FLAC/WAV file? That should yield the best results.

(Only asking because you linked to YouTube, and I'm not sure if you used the YouTube audio for your source.)


Perhaps you're right, I'd have to check.

I typed the song in the search and pressed the first likely result, which is the youtube video I linked. Using the software as intended I believe.


YouTube audio is optimized for bit rate, not quality (128K MP3). You will get better results with a higher-bitrate MP3 (320K would be good), better still with an uncompressed format like FLAC or WAV.


Makes sense. MP3 tries to compress without loosing information in the hearable spectrum of a human but that information can still be processed by algorithms.


I can't find any way to do this.


I tried throwing some underground rap artists at this app, as stem splitters usually struggle with them

I split https://www.youtube.com/watch?v=DDaL7KBjkDI

And it gave me this https://www.dropbox.com/sh/inyk38n2jrp5i45/AACpB0xXNFxamEmP3... I noticed some weird hissing with the 808s, but other then that it sounded pretty good

For more of a challenge, I inputted https://www.youtube.com/watch?v=uAwQ3njiU4M

and it came up with https://www.dropbox.com/sh/97lzke0puh9dzeo/AACE75vsbNS43UqqH... It was able to separate some of the kicks from the 808s, which is really impressive to me!

Overall, I'm very impressed! This sounds much better then lalal.ai to me


I'd like to take a moment to mention how great dropbox's audio seeking thing is. It's super fast and works as intended. Great work whoever implemented this.


I’ve found Lala to be my go to. If this is better, then I’m very interested in trying it out.


Just a follow up. My two conversions so far, Lalal.ai has been better. Especially separating drums from instruments. I'll give Stemroller a few more tries as I am always looking for options.


Update number three. I now just use both lalal and stemroller because each one seems to do better in certain cases. If I hadn’t paid for lalal, I’d probably just use stemroller as it’s way better than RX9


what genre of music, may i ask?


RNB, NeoSoul, Trap


Why do vocals.wav, other.wav, and instrumental.wav all start out the exact same (with vocal sounds)?


Super impressive splitting there, wow. Just curious, was your source a lossless or compressed file?


The second file was lossless, the first was ripped from a CD.


This seems to run just fine under Linux as well, not completely out of the box though: It's basically missing builds and config for Linux which can be build analogous to the existing Win/Mac stuff.

You also have to build the demucs-cxfreeze dependency (as described in its repo, https://github.com/stemrollerapp/demucs-cxfreeze).


It's almost eerie how well this works with electronic music. Coming from an age where your best try to separate a track was using equalizers, I didn't have high hopes.

Trying it out with Alan Walker's Alone, it separates the vocals and drums almost perfectly. Bass is really fine as well, only instrumental and 'other' was a bit mixed up in my try.


Whenever I see an "##Installation" section with more than one step, I immediately call DOCKER!


"Download and extract the latest ffmpeg snapshot from evermeet.cx and place the ffmpeg executable inside"

Why? Why can't this just point to the location where ffmpeg is rather than making a copy of ffmpeg? symlink might work, but just do a $(which ffmpeg) or ask the user for the path ~/bin/ffmpeg /usr/local/bin/ffmpeg etc


ffmpeg has not had a stable command line interface for some time. It can be a problem to assume that the system-installed version accepts the arguments you plan to give it.


It's even easier than that. There's a few npm libs around that are dedicated to shipping a copy of ffmpeg with electron.


even easier than what i already have on my system? what are you saying here, as it makes no sense to me


Maybe there’s some feature of bleeding edge ffmpeg that’s required for the app


Open Culture recently posted a link to Abbey Road but with only Paul's bass lines, but the actual content got taken down. [0] It was really cool though, in part because it's not nearly as precise as I would have thought, which made it feel really organic.

[0] https://www.openculture.com/2022/04/hear-the-beatles-abbey-r...


In the real world where tracks are cut live, there is a fair bit of microphone bleed


I imagine studio-era Beatles in particular would be difficult.

Microphone bleed, lots of overdubs (especially vocals), and repeated re-layering tracks on tape over and over due to channel limitations. They really were doing crazy stuff with limited tech.

I think this would be hard for bands that really fill the spectrum and don’t have that clean treble, mid, bass separation. Or recordings really compressed into a frequency range.

Now this makes me want to see what happens with like My Bloody Valentine and Husker Du :).


Especially in the day and style that the Beatles recorded. Today, not so much



Been using demucs for a couple of weeks now, mostly taking my early produced music which I have since lost the project files for and giving them a remix and update. Gotta say I have been blown away by how good demucs is. I installed it following the repo instructions and then created a zsh alias to run it with any file name. Eg $ai_split mySong.mp3

Wait fifteen minutes and out pops four stems, flawless so far, even been messing around with mainstream tracks and using ableton with warp applied to quickly build out remixes. Demucs is going to be /is already a game changer!


This testimonial almost has me wanting to try it on an “album”[1] I recorded when I was in a “band”[2] in high school. I too lost all of the source files[3].

1: On second thought maybe not. It has not aged well.

2: Me and another kid, with a guitar, a pre-OS X Mac, a pirated copy of Rebirth, a pirated copy of SoundEdit 16, and literally the mic that Apple used to include with (some?) Macs. I’d back-reference[1], but our equipment was not the problem. Well, except for [3].

3: I learned my lesson: I should have been older and had a job that would afford me a backup drive, so I could sample the sounds of that dying HDD and retcon the samples into my “album”[1].


That’s awesome! I wonder if there are projects to create a repository of pre-split public domain music? Seems like something the internet archive could host once created.


Are there any public examples of the split audio files?



That playlist cover can definitely pass as an album art.


Let's see how long it takes for some new Neil Cicierega remixes to appear now.


With a tool like this, you could get back into the animutation scene. (Edit: I guess it's a bit of a non-sequitur, but I enjoyed Suzukisan, so there's that.)


Is there a way to process my own audio file rather than choosing one from YouTube?


A couple of commenters have mentioned using lossless files, but so far no one has said HOW.



Thanks. The comments seemed specific to this front-end, but maybe.


How is this similar/different than the Deezer one?


I just did a quick test of demucs vs spleeter:4stems. demucs is significantly slower but the output is better.

in a semi blind comparison, I prefer demucs for all 4 tracks (drum, bass, vocals, and other). bass and other stand out the most so let me say a couple words about them.

bass - the demucs bass has less bleed from other instruments and the volume is consistent throughout. with spleeter, the volume varies a lot and there are multiple sections of 1-2 bars where it just drops out completely. In Capo, the demucs spectrogram is nice and clear whereas spleeter tends to look like pencil smudges for the most part.

other - with spleeter, whenever there are vocals, the other instruments turn to mush. demucs is much better. Oh, you can tell people are singing -- the instruments get muffled -- but you can still hear them.


It's pretty decent. I threw a drum'n'bass track at it to see how it would cope with heavily produced material and the results were surprisingly good.


I'd also be interested in how it compares to iZotope RX's Music Rebalance (examples from earlier releases here: https://www.izotope.com/en/learn/stem-isolation-music-rebala...).


I'd be interested to know how it compares to iZotope as well as phonicmind.


I just checked "Californication" (used for all their other examples here: https://soundcloud.com/honualx/sets/source-separation-in-the...) in RX9 Music Rebalance with the setting to "best", and I wasn't very impressed.

Seems like this tool might be better than Izotope's.


I dabble in audio production in my free time outside of work, and I typically will use iZotope RX 9 or Neural Mix Pro for isolating vocals or stems. However, these are paid products, and it's encouraging to see more open source projects being built around this space.

I like the opportunity to view the source code and learn from it, as opposed to most paid products which are typically closed-source and a bit of a "black box".

Sure - this is mostly just an accessible frontend for Demucs, but that's still okay. The author clearly indicates that in his repo, giving credit where credit is due. Additionally, this helps less-technical creators be creative in new ways.

Thanks to all who contributed.


Honestly, this sort of thing is cool; but why (in general) is it necessary in the first place?

If the elements of the song are recording in isolation - which they are in all studio versions, why can't we just move to a format that supports the layering?


Musicians and studios don't generally tend to offer the public access to original stems for songs (why would they?)

Say that you want to make a remix, mashup, or otherwise use sound-bytes from a song. The easiest thing to do is use a tool like Spleeter/Demucs to separate the source layers so that you can then further process them in your DAW.

This is what I do, but I just use the Demucs CLI because it's simple enough.

https://github.com/facebookresearch/demucs


Are there no communities of "open source" music? It sounds like the stems are part of the "source code" for tracks.


Many niches in electronic music have small knit communities of creators and producers that regularly remixes each-others stuff. But it is not an open community, you gotta have a decent standing (from making own music or prior remixes) before someone is willing to send you their stems. For anyone musician that has a label/publisher, they also need to be in the loop, for handling of the royalties. So sharing stems happen regularly in the music industry, but it is not easily accessible. Which makes tools like the one mentioned very useful for everyone else that would like to participate.


It isn't really in the best interest of the artist to provide this. The final mix is part of the overall product / work of art. Providing all of the individual tracks (there could be 30 or more in total) would also take up a lot of space / increase processing requirements while benefiting very few.


I usually use this kind of tools to get the bass score of some songs, for example. With the isolated elements it is much easier to know exactly what notes are sounding (I don't have a good ear). The same for drums or synth notes.

As after all the sound quality doesn't interest me too much to do this, I usually use iZotope RX, but I will try this tool.


This is like asking why we need decompilers.


> (In general)

Yes, I agree.


For all who look for something like this, iZotope RX (the audio retouche software) has a function called "Musical Rebalance" which is great for reducing spill or changing the balance in a live recording.


talk about a missed opportunity without examples. did I miss them somewhere?


I've always wanted a way to extract just the kick drums in realtime but I don't understand this field well enough to understand whether it would be remotely possible or not.


You want just the beat, ie the time markers of each kick? Or you want the isolated sound (ie audio) of each kick? Both are generally possible today, though the approach will differ a little bit.


Just wow! There were methods extracting acapellas from tracks, but this tool here is another level. Fascinating how good the results are.


This is awesome! Tried it out on Rush's Tom Sawyer and it splits out the vocals great! I can see this being super useful!


Would appreciate an easier way to download and run this! The steps on the readme are pretty long, at least for me (Mac user)


How does it compare to lalal.ai ?


It's free.


And otherwise identical?


Demucs did a much better job of isolating the bass on a blues track than LALAL. The bass actually sounded like a bass. LALAL got the note pitches but lost their attacks.


Anyone else just getting 'failed' on every song they try?


How do you load a local file?


There is no support for a such thing, this is software in year 2022, never local, online first.


Hahah, I know, right? People actually believe that shit... until they get jacked by a service provider.


Is there a VST front end?


Wow


How does it perform compared to Deezer Spleeter or lalal.ai

Else who cares




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: