Your Voice is My Passport

Financial institutions, home automation products, and hi-tech offices have increasingly used voice fingerprinting as a method for authentication. Recent advances in machine learning have shown that text-to-speech systems can generate synthetic, high-quality audio of subjects using audio recordings of their speech. Are current techniques for audio generation enough to spoof voice authentication algorithms? We demonstrate, using freely available machine learning models and limited budget, that standard speaker recognition and voice authentication systems are indeed fooled by targeted text-to-speech attacks. We further show a method which reduces data required to perform such an attack, demonstrating that more people are at risk for voice impersonation than previously thought.

Presented by