The fictional interview between Joe Rogan and Steve Jobs was published by a voice synthesis company in the Middle East. The first episode of "Podcast.ai" was created by Play.ht, a company that sells voice services.
Rogan's voice is recreated by voice cloning technology in the interview, just like it was on Ars. The case of Darth Vader in Disney's Obi-Wan Kenobi TV series is a good example of how deep learning technology can be used to replicate distinctive voices.
The effect will be achieved by training the model on existing samples of the voice that will be cloned. Rogan is a good candidate for deep learning voice training because he has a lot of isolated voice on his podcasts. There was a PR stunt by an artificial intelligence company in 2019.
This instance of artificial intelligence tomfoolery is more interesting due to the fact that Play.ht also used the voice of Steve Jobs. His voice is choppy at times but he remembers his Apple keynotes and All Things Digital interviews from the 2000s. The text of the interview may have been generated by a large language model similar to GPT3 according to Play.ht.
AdvertisementPlay.ht says thatTranscripts are generated with fine-tuned language models. The Steve Jobs episode was trained to bring him back to life by using his biography and recordings online.
The 19 minute interview doesn't make sense. The parts of the interview that sound like conceptual mashups of common Jobs talking points include aesthetic, revolutionary products, competitors, and the triumphs of the original Macintosh.
If you compare the voice of the fake Jobs to that of the real Jobs in the Triumph of the Nerds interview, you will see that it is not a carbon copy. That's the problem I've always had with Microsoft. They've done good work, but they haven't had a chance to eat. They haven't had any sense of Aesthetics.
It's not clear whether it's legal to use Jobs' or Rogan's vocal likenesses in this way. The idea of a fictional celebrity podcasts got our attention despite the PR-stunt nature of the program. We are looking at a future where media artifacts from any era will be completely fluid and shapeable to fit any narrative as voice synthesis becomes more widespread. Jobs is a big fan of Rogan.
He says it's nice to sit in the car and listen to what you're saying.