The voice->text is inaccurate enough (especially because the many different lang...

The voice->text is inaccurate enough (especially because the many different languages that can be used) so you have to keep the original recording and not just the text; but yes, you can easily do things like "find all conversations where the word 'nakamoto' is heard" with mass scale voice/speech analysis.