15.ai - Misplaced Pages

This is an old revision of this page, as edited by Alalch E. (talk | contribs) at 09:02, 27 December 2024 (→Reception and legacy: rm, not necessary, similar to the Zwiezen quote). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 09:02, 27 December 2024 by Alalch E. (talk | contribs) (→Reception and legacy: rm, not necessary, similar to the Zwiezen quote)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) Real-time text-to-speech AI tool

An editor has nominated this article for deletion.
You are welcome to participate in the deletion discussion, which will decide whether or not to retain it.Feel free to improve the article, but do not remove this notice before the discussion is closed. For more information, see the guide to deletion.
Find sources: "15.ai" – news · newspapers · books · scholar · JSTOR%5B%5BWikipedia%3AArticles+for+deletion%2F15.ai+%283rd+nomination%29%5D%5DAFD

15.ai
Type of site	Artificial intelligence, speech synthesis
Available in	English
Founder(s)	15
URL	15.ai
Commercial	No
Registration	None
Launched	March 2020; 4 years ago (2020-03)
Current status	Inactive

15.ai was a free non-commercial web application that used artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Conceived by an artificial intelligence researcher known as "15" during their time at the Massachusetts Institute of Technology and developed following their successful exit from a startup venture, the application allowed users to make characters from various media speak custom text with emotional inflections faster than real-time.

Launched in March 2020, The service gained widespread attention in early 2021 when it went viral on social media platforms like YouTube and Twitter, and quickly became popular among Internet fandoms, including the My Little Pony: Friendship Is Magic, Team Fortress 2, and SpongeBob SquarePants fandoms. 15.ai is credited as the first mainstream platform to popularize AI voice cloning (audio deepfakes) in memes and content creation.

In January 2022, Voiceverse NFT sparked controversy when it was discovered that the company, which had partnered with voice actor Troy Baker, had misappropriated 15.ai's work for their own platform. The service was ultimately taken offline in September 2022. Its shutdown led to the emergence of various commercial alternatives in subsequent years.

History

Background

The field of artificial speech synthesis underwent a significant transformation with the introduction of deep learning approaches. In 2016, DeepMind's publication of the seminal paper WaveNet: A Generative Model for Raw Audio marked a pivotal shift toward neural network-based speech synthesis, demonstrating unprecedented audio quality through direct waveform modeling. WaveNet achieved this through dilated causal convolutions operating directly on raw audio waveforms at 16,000 samples per second, modeling the conditional probability distribution of each audio sample given all previous ones. Previously, concatenative synthesis–which worked by stitching together pre-recorded segments of human speech–was the predominant method for generating artificial speech, but it often produced robotic-sounding results with noticeable artifacts at the segment boundaries.

An example of synthesized audio using HiFi-GAN. HiFi-GAN represented a breakthrough in synthesizing natural-sounding speech and demonstrated that generative adversarial networks (GANs) could be used to develop high-quality vocoders.

Two years later, this was followed by Google AI's Tacotron in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron employed an encoder-decoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron failed to produce intelligible speech.

In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the emergence of HiFi-GAN, a generative adversarial network-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. This was followed by Glow-TTS, which introduced a flow-based approach that allowed for both fast inference and voice style transfer capabilities.

Chinese tech companies also made significant contributions to the field. Baidu and ByteDance developed proprietary text-to-speech frameworks that further advanced the state of the art, though specific technical details of their implementations remained largely undisclosed.

Development, release, and operation

15.ai was conceived in 2016 as a research project in deep learning speech synthesis by a developer known as "15" (at the age of 18) during their freshman year at the Massachusetts Institute of Technology (MIT). The developer was inspired by DeepMind's WaveNet paper, with development continuing through their studies as Google AI released Tacotron the following year. By 2019, the developer had demonstrated at MIT their ability to replicate WaveNet and Tacotron's results using 75% less training data than previously required. The name 15 is a reference to the creator's claim that a voice can be cloned with as little as 15 seconds of data.

The developer had originally planned to pursue a doctorate based on their undergraduate research, but opted to work in the tech industry instead after their startup was accepted into the Y Combinator accelerator in 2019. After their departure in early 2020, the developer returned to their voice synthesis research, implementing it as a web application. Instead of using conventional voice datasets like LJSpeech that contained simple, monotone recordings, they sought out more challenging voice samples that could demonstrate the model's ability to handle complex speech patterns and emotional undertones. The "Pony Preservation Project," a fan initiative that had compiled voice clips from My Little Pony: Friendship Is Magic, played a crucial role in the implementation. The project's contributors had manually trimmed, denoised, transcribed, and emotion-tagged every line from the show–work that was unprecedented among fan communities at the time, especially because this was completed before such tasks could be automated. This carefully curated dataset of highly emotional data provided ideal training material for 15.ai's deep learning model.

15.ai was released in March 2020 with a limited selection of characters, including those from My Little Pony: Friendship Is Magic and Team Fortress 2. More voices were added to the website in the following months. A significant technical advancement came in late 2020 with the implementation of a multi-speaker embedding in the deep neural network, enabling simultaneous training of multiple voices rather than requiring individual models for each character voice. This allowed rapid expansion from eight to over fifty character voices.

In early 2021, the application went viral on Twitter and YouTube, with people generating skits, memes, and fan content using voices from popular games and shows. 15.ai use also resulted in memes and viral videos. These included recreations of the popular Source Filmmaker video Heavy is Dead, The RED Bread Bank, and Among Us Struggles, which have amassed millions of views on social media. Content creators, YouTubers, and TikTokers have also used 15.ai as part of their videos as voiceovers. At its peak, the platform incurred operational costs of US$12,000 per month from AWS infrastructure needed to handle millions of daily voice generations; the website was funded through the developer's previous startup earnings.

Voiceverse NFT controversy

On January 14, 2022, a controversy ensued after it was discovered that Voiceverse NFT, a company that video game and anime dub voice actor Troy Baker announced his partnership with, had misappropriated voice lines generated from 15.ai as part of their marketing campaign. Log files showed that Voiceverse had generated audio of characters from My Little Pony: Friendship Is Magic using 15.ai, pitched them up to make them sound unrecognizable from the original voices to market their own platform—in violation of 15.ai's terms of service. Voiceverse claimed that someone in their marketing team used the voice without properly crediting 15.ai; in response, 15 tweeted "Go fuck yourself."

Inactivity

In September 2022, 15.ai was taken offline due to legal issues surrounding artificial intelligence and copyright. The creator has suggested a potential future version that would better address copyright concerns from the outset, though the website remains inactive as of 2024.

Features

The platform was non-commercial, and operated without requiring user registration or accounts. Users generated speech by inputting text and selecting a character voice, with optional parameters for emotional contextualizers and phonetic transcriptions. Each request produced three audio variations with distinct emotional deliveries. Characters available included multiple characters from Team Fortress 2 and My Little Pony: Friendship Is Magic; GLaDOS and Wheatley from the Portal series; SpongeBob SquarePants; Rise Kujikawa from Persona 4; Daria Morgendorffer and Jane Lane from Daria; Carl Brutananadilewski from Aqua Teen Hunger Force; Steven Universe from Steven Universe; Sans from Undertale; the Tenth Doctor Who; the Narrator from The Stanley Parable; and HAL 9000 from 2001: A Space Odyssey. Certain "silent" characters like Chell and Gordon Freeman were able to be selected as a joke, and would emit silent audio files when any text was submitted.

The deep learning model's nondeterministic properties produced variations in speech output, creating different intonations with each generation, similar to how voice actors produce different takes. 15.ai introduced the concept of "emotional contextualizers," which allowed users to specify the emotional tone of generated speech through guiding phrases. The emotional contextualizer functionality utilized DeepMoji, a sentiment analysis neural network developed at the MIT Media Lab. Introduced in 2017, DeepMoji processed emoji embeddings from 1.2 billion Twitter posts (2013-2017) to analyze emotional content. Testing showed the system could identify emotional elements, including sarcasm, more accurately than human evaluators.

The application provided support for a simplified version of ARPABET, a set of English phonetic transcriptions originally developed by the Advanced Research Projects Agency in the 1970s. This feature allowed users to correct mispronunciations or specify the desired pronunciation between heteronyms – words that have the same spelling but have different pronunciations. Users could invoke ARPABET transcriptions by enclosing the phoneme string in curly braces within the input box (for example, "{AA1 R P AH0 B EH2 T}" to specify the pronunciation of the word "ARPABET" (/ˈɑːrpəˌbɛt/ AR-pə-beht). The interface displayed parsed words with color-coding to indicate pronunciation certainty: green for words found in the existing pronunciation lookup table, blue for manually entered ARPAbet pronunciations, and red for words where the pronunciation had to be algorithmically predicted.

Later versions of 15.ai introduced multi-speaker capabilities. Rather than training separate models for each voice, 15.ai used a unified model that learned multiple voices simultaneously through speaker embeddings–learned numerical representations that captured each character's unique vocal characteristics. Along with the emotional context conferred by DeepMoji, this neural network architecture enabled the model to learn shared patterns across different characters' emotional expressions and speaking styles, even when individual characters lacked examples of certain emotional contexts in their training data.

The interface included technical metrics and graphs, which, according to the developer, served to highlight the research aspect of the website. As of version v23, released in September 2021, the interface displayed comprehensive model analysis information, including word parsing results and emotional analysis data. The flow and generative adversarial network (GAN) hybrid denoising function, introduced in an earlier version, was streamlined to remove manual parameter inputs.

Reception and legacy

Critics described 15.ai as easy to use and generally able to convincingly replicate character voices, with occasional mixed results. Natalie Clayton of PC Gamer wrote that SpongeBob SquarePants' voice was replicated well, but noted challenges in mimicking the Narrator from the The Stanley Parable: "the algorithm simply can't capture Kevan Brighting's whimsically droll intonation." Zack Zwiezen of Kotaku reported that " girlfriend was convinced it was a new voice line from GLaDOS' voice actor, Ellen McLain". Taiwanese newspaper United Daily News also highlighted 15.ai's ability to recreate GLaDOS's mechanical voice, alongside its diverse range of character voice options. Yahoo! News Taiwan reported that "GLaDOS in Portal can pronounce lines nearly perfectly", but also criticized that "there are still many imperfections, such as word limit and tone control, which are still a little weird in some words." Eugenio Moto of Spanish news website Qore.com wrote that "while the results are already exceptional, they can certainly get better." Shaun Scotellaro, the founder of Equestria Daily—also known by his online moniker "Sethisto"—called 15.ai "neat"; he also wrote that "some of aren't great due to the lack of samples to draw from, but many are really impressive still anyway."

Multiple other critics also found the character limit, prosody options, and English-only nature of the application as not entirely satisfactory. Peter Paltridge of anime and superhero news outlet Anime Superhero opined that "voice synthesis has evolved to the point where the more expensive efforts are nearly indistinguishable from actual human speech," but also noted that "In some ways, SAM is still more advanced than this. It was possible to affect SAM’s inflections by using special characters, as well as change his pitch at will. With 15.ai, you’re at the mercy of whatever random inflections you get." Conversely, Lauren Morton of Rock, Paper, Shotgun praised the depth of pronunciation control—"if you're willing to get into the nitty gritty of it". Takayuki Furushima of Den Fami Nico Gamer highlighted the "smooth pronunciations", and Yuki Kurosawa of AUTOMATON noted its "rich emotional expression" as a major feature; both Japanese authors noted the lack of Japanese-language support. Renan do Prado of the Brazilian gaming news outlet Arkade pointed out that users could create amusing results in Portuguese, although generation primarily performed well in English. Chinese gaming news outlet GamerSky called the app "interesting", but also criticized the character limit of the text and the lack of intonations. South Korean video game outlet Zuntata wrote that "the surprising thing about 15.ai is that , there's only about 30 seconds of data, but it achieves pronunciation accuracy close to 100%". Machine learning professor Yongqiang Li wrote in his blog that he was surprised to see that the application was free.

15.ai was an early pioneer of audio deepfakes, leading to the emergence of AI speech synthesis-based memes. Its influence has been noted in the years after it became defunct, and since then, several commercial alternatives emerged, such as ElevenLabs and Speechify. The original claim that only 15 seconds of data is required to clone a human's voice was corroborated by OpenAI in 2024.

Explanatory footnotes

The term "faster than real-time" in speech synthesis means that the system can generate audio more quickly than the actual duration of the speech – for example, generating 10 seconds of speech in less than 10 seconds would be considered faster than real-time.
which uses "11.ai" as a legal byname for its web domain

References

Notes

遊戲 2021; Yoshiyuki 2021.
Kurosawa 2021; Ruppert 2021; Clayton 2021; Morton 2021; Temitope 2024.
^ Ng 2020.
Zwiezen 2021; Chandraseta 2021; Temitope 2024.
^ GamerSky 2021.
Temitope 2024; Anirudh VK 2023; Wright 2023.
"Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". August 30, 2018. Archived from the original on November 11, 2020. Retrieved June 5, 2022.
Ren 2019; Temitope 2024.
Ren 2019.
Kong 2020.
Kim 2020.
^ Temitope 2024.
^ "The past and future of 15.ai". Twitter. Archived from the original on December 8, 2024. Retrieved December 19, 2024.
Chandraseta 2021; Temitope 2024.
Chandraseta 2021; Button 2021.
- "About". fifteen.ai (Official website). February 19, 2020. Archived from the original on February 23, 2020. Retrieved December 23, 2024. 2020-02-19: The web app isn't fully ready just yet
- "About". fifteen.ai (Official website). March 2, 2020. Archived from the original on March 3, 2020. Retrieved December 23, 2024.
^ Scotellaro 2020a; Scotellaro 2020b.
Zwiezen 2021; Clayton 2021; Ruppert 2021; Yoshiyuki 2021.
^ 遊戲 2021.
^ Kurosawa 2021.
^ Morton 2021.
Play.ht 2024.
Lawrence 2022; Williams 2022; Wright 2022.
Phillips 2022; Lopez 2022.
Wright 2022; Phillips 2022; fifteenai 2022 sfnm error: no target: CITEREFfifteenai2022 (help).
^ ElevenLabs 2024a; Play.ht 2024.
Williams 2022.
Phillips 2022.
^ Chandraseta 2021.
Zwiezen 2021; Clayton 2021; Morton 2021; Ruppert 2021.
Morton 2021; 遊戲 2021.
Yoshiyuki 2021.
Knight 2017.
www.equestriacn.com 2022; Kurosawa 2021; Temitope 2024.
^ www.equestriacn.com 2022.
Kurosawa 2021; Temitope 2024.
Clayton 2021; Ruppert 2021; Moto 2021; Scotellaro 2020c; Villalobos 2021.
Clayton 2021.
Zwiezen 2021.
^ MrSun 2021.
Moto 2021.
Paltridge 2021.
Yoshiyuki 2021: 日本語入力には対応していないが、ローマ字入力でもなんとなくそれっぽい発音になる。; 15.aiはテキスト読み上げサービスだが、特筆すべきはそのなめらかな発音と、ゲームに登場するキャラクター音声を再現している点だ。 (transl. It does not support Japanese input, but even if you input using romaji, it will somehow give you a similar pronunciation.; 15.ai is a text-to-speech service, but what makes it particularly noteworthy is its smooth pronunciation and the fact that it reproduces the voices of characters that appear in games.)
do Prado 2021.
zuntata.tistory.com 2021.
Li 2021.
MrSun 2021: 大家是否都曾經想像過，假如能讓自己喜歡的遊戲或是動畫角色說出自己想聽的話，不論是名字、惡搞或是經典名言，都是不少人的夢想吧。不過來到 2021 年，現在這種夢想不再是想想而已，因為有一個網站通過 AI 生成的技術，讓大家可以讓不少遊戲或是動畫角色，說出任何你想要他們講出的東西，而且相似度與音調都有相當高的準確度 (transl. Have you ever imagined what it would be like if your favorite game or anime characters could say exactly what you want to hear? Whether it's names, parodies, or classic quotes, this is a dream for many. However, as we enter 2021, this dream is no longer just a fantasy, because there is a website that uses AI-generated technology, allowing users to make various game and anime characters say anything they want with impressive accuracy in both similarity and tone).
Anirudh VK 2023.
Wright 2023.
ElevenLabs 2024b.
OpenAI 2024; Temitope 2024.

Works cited

遊戲, 遊戲角落 (January 20, 2021). "這個AI語音可以模仿《傳送門》GLaDOS講出任何對白！連《Undertale》都可以學" [This AI Voice Can Imitate Portal's GLaDOS Saying Any Dialog! It Can Even Learn Undertale]. United Daily News (in Chinese (Taiwan)). Archived from the original on December 19, 2024. Retrieved December 18, 2024.
Yoshiyuki, Furushima (January 18, 2021). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に" [Portal's GLaDOS and UNDERTALE's Sans Will Read Text for You. "15.ai" Service Aims to Reproduce Even the Emotions in Text, Becomes Topic of Discussion]. Den Fami Nico Gamer (in Japanese). Archived from the original on January 18, 2021. Retrieved December 18, 2024.
Kurosawa, Yuki (January 19, 2021). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる" [Game Character Voice Reading Software "15.ai" Now Available. Get Characters from Undertale and Portal to Say Your Desired Lines]. AUTOMATON (in Japanese). Archived from the original on January 19, 2021. Retrieved December 18, 2024. 英語版ボイスのみなので注意。;もうひとつ15.aiの大きな特徴として挙げられるのが、豊かな感情表現だ。 [Please note that only English voices are available.;Another major feature of 15.ai is its rich emotional expression.]
Ruppert, Liana (January 18, 2021). "Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App". Game Informer. Archived from the original on January 18, 2021. Retrieved December 18, 2024.
Clayton, Natalie (January 19, 2021). "Make the cast of TF2 recite old memes with this AI text-to-speech tool". PC Gamer. Archived from the original on January 19, 2021. Retrieved December 18, 2024.
Morton, Lauren (January 18, 2021). "Put words in game characters' mouths with this fascinating text to speech tool". Rock, Paper, Shotgun. Archived from the original on January 18, 2021. Retrieved December 18, 2024.
Ng, Andrew (April 1, 2020). "Voice Cloning for the Masses". DeepLearning.AI. Retrieved December 22, 2024.
Zwiezen, Zack (January 18, 2021). "Website Lets You Make GLaDOS Say Whatever You Want". Kotaku. Archived from the original on January 17, 2021. Retrieved December 18, 2024.
"这个网站可用AI生成语音让ACG角色"说"出你输入的文本" [This Website Can Use AI to Generate Voice, Making ACG Characters "Say" the Text You Input]. GamerSky (in Chinese). January 18, 2021. Archived from the original on December 11, 2024. Retrieved December 18, 2024.
Chandraseta, Rionaldi (January 21, 2021). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on January 21, 2021. Retrieved December 18, 2024.
Williams, Demi (January 18, 2022). "Voiceverse NFT admits to taking voice lines from non-commercial service". NME. Archived from the original on January 18, 2022. Retrieved December 18, 2024.
Wright, Steve (January 17, 2022). "Troy Baker-backed NFT company admits to using content without permission". Stevivor. Archived from the original on January 17, 2022. Retrieved December 18, 2024.
Phillips, Tom (January 17, 2022). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Eurogamer. Archived from the original on January 17, 2022. Retrieved December 18, 2024.
"15.ai已经重新上线，版本更新至v23" [15.ai has been re-launched, version updated to v23] (in Chinese). October 1, 2021. Retrieved December 22, 2024.
MrSun (January 19, 2021). "讓你喜愛的ACG角色說出任何話！ AI生成技術幫助你實現夢想" [Let your favorite ACG characters say anything! AI generation technology helps you realize your dreams] (in Chinese). Retrieved December 22, 2024.
do Prado, Renan (January 19, 2021). "Faça GLaDOS, Bob Esponja e outros personagens falarem textos escritos por você!" [Make GLaDOS, SpongeBob and other characters speak texts written by you!]. Arkade (in Brazilian Portuguese). Retrieved December 22, 2024.
"15.AI: Everything You Need to Know & Best Alternatives". ElevenLabs. 2024a. Archived from the original on July 15, 2024. Retrieved December 18, 2024.
"Everything You Need to Know About 15.ai: The AI Voice Generator". Play.ht. September 12, 2024. Retrieved December 18, 2024.
Button, Chris (January 19, 2021). "Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool". Byteside. Archived from the original on June 25, 2024. Retrieved December 18, 2024.
Scotellaro, Shaun (2020a). "Rainbow Dash Voice Added to 15.ai". Equestria Daily. Archived from the original on December 1, 2024. Retrieved December 18, 2024.
Scotellaro, Shaun (2020b). "15.ai Adds Tons of New Pony Voices". Equestria Daily. Retrieved December 21, 2024.
Lawrence, Briana (January 19, 2022). "Shonen Jump Scare Leads to Company Reassuring Fans That They Aren't Getting Into NFTs". The Mary Sue. Retrieved December 23, 2024.
Lopez, Ule (January 16, 2022). "Voiceverse NFT Service Reportedly Uses Stolen Technology from 15ai [UPDATE]". Wccftech. Archived from the original on January 16, 2022. Retrieved June 7, 2022.
@fifteenai (January 14, 2022). "Go fuck yourself" (Tweet) – via Twitter.
Knight, Will (August 3, 2017). "An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter". MIT Technology Review. Archived from the original on June 2, 2022. Retrieved December 18, 2024.
Moto, Eugenio (January 20, 2021). "15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras". Qore (in Spanish). Retrieved December 21, 2024. Si bien los resultados ya son excepcionales, sin duda pueden mejorar más [While the results are already exceptional, without a doubt they can improve even more]
Scotellaro, Shaun (2020c). "Neat "Pony Preservation Project" Using Neural Networks to Create Pony Voices". Equestria Daily. Archived from the original on June 23, 2021. Retrieved December 18, 2024.
Villalobos, José (January 18, 2021). "Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras" [Discover 15.AI, a Website Where You Can Make GlaDOS Say What You Want]. LaPS4 (in Spanish). Archived from the original on January 18, 2021. Retrieved January 18, 2021. La dirección es 15.AI y funciona tan fácil como parece. [The address is 15.AI and it works as easy as it looks.]
Paltridge, Peter (January 18, 2021). "This Website Will Say Whatever You Type In Spongebob's Voice". Retrieved December 22, 2024.
"게임 캐릭터 음성으로 영어를 읽어주는 소프트 15.ai 공개" [Software 15.ai Released That Reads English in Game Character Voices]. Tistory (in Korean). January 20, 2021. Retrieved December 18, 2024.
Li, Yongqiang (2021). "语音开源项目优选：免费配音网站15.ai" [Voice Open Source Project Selection: Free Voice Acting Website 15.ai]. Zhihu (in Chinese). Retrieved December 18, 2024.
Anirudh VK (March 18, 2023). "Deepfakes Are Elevating Meme Culture, But At What Cost?". Analytics India Magazine. Retrieved December 18, 2024. While AI voice memes have been around in some form since '15.ai' launched in 2020,
Wright, Steven (March 21, 2023). "Why Biden, Trump, and Obama Arguing Over Video Games Is YouTube's New Obsession". Inverse. Archived from the original on December 20, 2024. Retrieved December 18, 2024. AI voice tools used to create "audio deepfakes" have existed for years in one form or another, with 15.ai being a notable example.
"Can I publish the content I generate on the platform?". ElevenLabs (Official website). 2024b. Retrieved December 23, 2024.
"Navigating the Challenges and Opportunities of Synthetic Voices". OpenAI. March 9, 2024. Archived from the original on November 25, 2024. Retrieved December 18, 2024.
Temitope, Yusuf (December 25, 2024). "15.ai Creator reveals journey from MIT Project to internet phenomenon". The Guardian. Retrieved December 25, 2024.{{cite web}}: CS1 maint: url-status (link)
Ren, Yi (2019). "FastSpeech: Fast, Robust and Controllable Text to Speech". arXiv:1905.09263.
Kong, Jungil (2020). "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis". arXiv:2010.05646.
Kim, Jaehyeon (2020). "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search". arXiv:2005.11129.

Speech synthesis

Free software

Speaking	eSpeak/eSpeakNG Gnopernicus Gnuspeech Orca Festival Speech Synthesis System/Flite FreeTTS Automatik Text Reader Retrieval-based Voice Conversion
Singing	eCantorix Lyricos / Flinger Sinsy Retrieval-based Voice Conversion

Proprietary
software

Speaking	Amazon Polly DECtalk Software Automatic Mouth Talk It! Microsoft Agent Microsoft Speech API Microsoft text-to-speech voices Readspeaker Voice browser CoolSpeech IVONA CereProc CeVIO Creative Studio Voiceroid LaLaVoice 15.ai ElevenLabs
Singing	Alter/Ego Cantor CeVIO Creative Studio Chipspeech NIAONiao Virtual Singer PPG Phonem Symphonic Choirs UTAU Vocalina Vocaloid Xiaoice