This is known as Blind Source Separation [1], and it's been a field of study for decades. The specific problem here seems to be the "cocktail party problem", where you want to isolate a single speaker (or in this case 5?) in a room full of conversations.
When I was in grad school, I knew an EE research group in the building next to mine working on this problem using ICA (independent components analysis) -- this was ca 2004, before the resurgence of deep learning. Even with ICA useful results could be obtained.
The results of the FB work [2] with RNNs are pretty impressive (audio samples).
When I was in grad school, I knew an EE research group in the building next to mine working on this problem using ICA (independent components analysis) -- this was ca 2004, before the resurgence of deep learning. Even with ICA useful results could be obtained.
The results of the FB work [2] with RNNs are pretty impressive (audio samples).
[1] https://en.wikipedia.org/wiki/Signal_separation
[2] https://enk100.github.io/speaker_separation/