From the article this was an employee in Hong Kong on a video call with people supposedly in the UK.
Power distance might matter, depending on nationality of participants.
Also if English is a second language, then perhaps the sound quality of the synthetic voices wouldn't need to be as good - we are surely better at recognising voices in our mother tongue.
Scammers have fooled countless mothers into believing their voice belongs to one of their children before text-to-speach was a thing. (Just to say it's not incredibly hard. I'm not suggesting that being able to automate it wouldn't have a huge impact.)
Power distance might matter, depending on nationality of participants.
Also if English is a second language, then perhaps the sound quality of the synthetic voices wouldn't need to be as good - we are surely better at recognising voices in our mother tongue.