How I Cloned My Voice With AI
12 June 2023
AI
Voice Cloning
Voice cloning is one of the more controversial aspects of Artificial Intelligence, with a host of security and ethical concerns about its use. That said, I was curious about how well it would work, and if a machine could really accurately replicate the sound of my voice.
1. How Does AI Voice Cloning Work?
I tested two different voice cloning services, Play.ht, and Descript. There are others, but these two stood out for ease of use. For both of them, I had to submit samples of my voice for the AI to learn.

Play.ht has two options, ‘High Fidelity’ and ‘Instant’. High fidelity requires at least 30 minutes of recording, whereas instant only needs 30 seconds. I tried both.
Descript needs a minimum of 10, but recommends 30 minutes.
Luckily, I have a lot of recorded content from training courses I developed, so it was as easy as just uploading the files.
The processing time varied by platform. Play.ht needed 2 hours for high fidelity, and instant is, well, instant. Descript took almost 24 hours to develop the voice clone.
2. The Results
The results were definitely surprising. In the audio tracks below, three are the AI cloned voice, and one is my real voice as a reference. See if you can guess which voice is which from:
My real voice
Play.ht High fidelity
Play.ht Instant
Descript
I asked a few different people, and the feedback was mixed. While a couple of the voices are obviously not quite right, one is relatively convincing.
3. The Summary
The progress of AI voice cloning is both amazing and disconcerting. While it brings numerous possibilities for positive applications, its potential misuse for nefarious purposes cannot be ignored.
It definitely has applications in automation (think content creation, sales processes, and training), but it’s still imperfect.
The Answers
This is which voice recording is which:
Play.ht Instant
Play.ht High fidelity
Descript
My real voice