This is an old revision of this page, as edited by .hecko (talk | contribs) at 15:24, 11 April 2022 (more citation fixing). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 15:24, 11 April 2022 by .hecko (talk | contribs) (more citation fixing)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) Real-time text-to-speech tool using artificial intelligenceDeveloper(s) | 15 |
---|---|
Initial release | March 2020; 4 years ago (2020-03) |
Stable release | v24.2.1 / September 2021; 3 years ago (2021-09) |
Written in | Vue.js, Python, Julia |
Available in | English |
Type | Artificial intelligence, speech synthesis, machine learning, deep learning |
Website | 15 |
15.ai is a non-commercial freeware artificial intelligence web application that generates natural emotive high-fidelity text-to-speech voices from an assortment of fictional characters from a variety of media sources. Developed by an anonymous MIT researcher under the eponymous pseudonym 15, the project uses a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate and serve emotive character voices faster than real-time, even those with a very small amount of data.
Features
Available characters include GLaDOS and Wheatley from Portal, characters from Team Fortress 2, Twilight Sparkle and a number of main, secondary, and supporting characters from My Little Pony: Friendship is Magic, SpongeBob from SpongeBob SquarePants, Daria Morgendorffer and Jane Lane from Daria, the Tenth Doctor from Doctor Who, HAL 9000 from 2001: A Space Odyssey, the Narrator from The Stanley Parable, the Wii U/3DS/Switch Super Smash Bros. Announcer (formerly), Sans from Undertale, and Carl Brutananadilewski from Aqua Teen Hunger Force.
The deep learning model used by the application is nondeterministic: each time that speech is generated from the same text string, the intonation of the speech will be slightly different. The application supports English phonetic transcriptions (such as ARPABET) to correct mispronunciations or to account for heteronyms—words that are spelled the same but are pronounced differently (such as the word read, which can be pronounced as either /ˈrɛd/ or /ˈriːd/ depending on its tense). The app also supports altering the emotion of a generated line using emotional contextualizers (a term coined by this project), a sentence or phrase that conveys the emotion of the take that serves as a guide for the model during inference.
15.ai uses a multi-speaker model—hundreds of characters are trained concurrently rather than sequentially, significantly reducing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to such emotional context. The lexicon used by 15.ai has been scraped from a variety of Internet sources, including Oxford Dictionaries, Wiktionary, the CMU Pronouncing Dictionary, 4chan, Reddit, and Twitter. Pronunciations of unfamiliar words are automatically deduced using phonological rules learned by the deep learning model.
Background
Speech synthesis
Main article: Deep learning speech synthesisIn 2016, with the proposal of DeepMind's WaveNet, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech. Tacotron2, a neural network architecture for speech synthesis developed by Google AI, was published in 2018 and required tens of hours of audio data to train. For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.
The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.
Copyrighted material in deep learning
Main article: Authors Guild, Inc. v. Google, Inc.A landmark case between Google and the Authors Guild in 2013 ruled that Google Books—a service that searches the full text of printed copyrighted books—met all requirements for fair use. This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a discriminative model or a non-commercial generative model was deemed legal.
Development
15.ai was designed and created by an anonymous research scientist affiliated with the Massachusetts Institute of Technology known by the alias 15, ostensibly in reference to the minimum amount of data required to convincingly clone a voice. The project began development while the developer was an undergraduate. Although the application costs several thousands of dollars a month to keep up and maintain, the developer has stated that they are capable of paying the high cost of running the site out of pocket.
The algorithm used by the project to facilitate the cloning of voices with minimal viable data has been dubbed DeepThroat (a double entendre in reference to speech synthesis using deep neural networks and the sexual act of deep-throating). The project and algorithm—initially conceived as part of MIT's Undergraduate Research Opportunities Program—had been in development for years before the first release of the application.
The developer has also worked closely with the Pony Preservation Project from /mlp/, the My Little Pony board of 4chan. The Pony Preservation Project is a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence. According to the developer, the collective efforts and constructive criticism from the Pony Preservation Project has been integral to the development of 15.ai. During months-long hiatuses of the public release of the site in late 2020 and early 2021, a number of test sites were put up for exclusive use by /mlp/ and the Pony Preservation Project for the purposes of testing and generating content.
Reception
15 @fifteenai I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. After digging through the log files, I have evidence that some of the voices that they are taking credit for were indeed generated from my own site.
January 14, 2022
Voiceverse Origins @VoiceverseNFT Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again.
January 14, 2022
15 @fifteenai Go fuck yourself.
January 14, 2022
Over 3 million lines were generated within the first two weeks of the September 2021 release of the application. As of February 2022, 15.ai's Patreon was raising over $1,100 per month.
Fandom content creation
15.ai has been frequently used for content creation in various fandoms, including the My Little Pony: Friendship Is Magic fandom, the Team Fortress 2 fandom, the Portal fandom, and the SpongeBob SquarePants fandom. Numerous videos/projects that have 15.ai-synthesized speech in them have been made and gone viral. However, numerous videos/projects that have non-15.ai-synthesized speech in them have been made and gone viral too—some of them without having properly credited the source(s) of their AI-synthesized voice clips. As a consequence, many videos/projects that have been made with other speech synthesis software have been mistaken as being made with 15.ai, and vice versa. Due to this misattribution and absence of proper credit, 15.ai's terms of service has a rule that forbids having 15.ai-and-non-15.ai-synthesized speech in the same videos/projects.
The My Little Pony: Friendship Is Magic fandom has seen a resurgence in video and musical content creation as a direct result; moreover, the project has been utilized as a creative tool in pornography. For instance, the Pony Zone videos is a series of erotic musical videos that heavily samples 15.ai as the vocals—the creators of such videos make frequent use of salacious emotional contextualizers and punctuation/ARPABET tricks to induce the models to grunt, sigh, and moan convincingly.
Troy Baker / VoiceVerse NFT scandal
On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game voice actor Troy Baker announced his partnership with, had stolen voice lines generated from 15.ai as part of their marketing campaign without permission. Log files showed that Voiceverse had generated audio of Twilight Sparkle and Rainbow Dash from the show My Little Pony: Friendship Is Magic using 15.ai, pitched them up to make them sound unrecognizable from the original voices, and appropriated them without proper credit to falsely market their own platform—a violation of 15.ai's terms of service.
The initial partnership between Troy Baker and Voiceverse was met with severe backlash and universally negative reception. Critics highlighted the potential environmental impact of and potential for exit scams associated with NFT sales. Two weeks later, on January 31, Baker announced that he would discontinue his partnership with Voiceverse.
Resistance from voice actors
Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about impersonation and fraud, unauthorized use of an actor's voice in pornography, and the potential of AI being used to make voice actors obsolete.
See also
- Speech synthesis
- Deep learning speech synthesis
- Virtual actor
- 4chan
- My Little Pony: Friendship Is Magic fandom
References
- Notes
- ^ "15.ai - About". 15.ai. 2022-02-20. Retrieved 2022-02-20.
- ^ Chandraseta, Rionaldi (2021-01-19). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on 2021-01-21. Retrieved 2021-01-23.
- ^ Zwiezen, Zack (2021-01-18). "Website Lets You Make GLaDOS Say Whatever You Want". Kotaku. Kotaku. Archived from the original on 2021-01-17. Retrieved 2021-01-18.
- ^ Ruppert, Liana (2021-01-18). "Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App". Game Informer. Game Informer. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
- Clayton, Natalie (2021-01-19). "Make the cast of TF2 recite old memes with this AI text-to-speech tool". PC Gamer. PC Gamer. Archived from the original on 2021-01-19. Retrieved 2021-01-19.
- Morton, Lauren (2021-01-18). "Put words in game characters' mouths with this fascinating text to speech tool". Rock, Paper, Shotgun. Rock, Paper, Shotgun. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
- Yoshiyuki, Furushima (2021-01-18). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に". Denfaminicogamer. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
- Kurosawa, Yuki (2021-01-19). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる". AUTOMATON. AUTOMATON. Archived from the original on 2021-01-19. Retrieved 2021-01-19.
- Villalobos, José (2021-01-18). "Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras". LaPS4. LaPS4. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
- Moto, Eugenio (2021-01-20). "15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras". Yahoo! Finance. Yahoo! Finance. Archived from the original on 2022-03-08. Retrieved 2021-01-20.
- "15.ai - Guide". 15.ai. 2022-02-20. Retrieved 2022-02-20.
- Cooper, Erica (2020). "Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings". arXiv:1910.10838 .
- Hsu, Wei-Ning (2018). "Hierarchical Generative Modeling for Controllable Speech Synthesis". arXiv:1810.07217 .
- Habib, Raza (2019). "Semi-Supervised Generative Modeling for Controllable Speech Synthesis". arXiv:1910.01709 .
- Shen, Jonathan; Pang, Ruoming; Weiss, Ron J.; Schuster, Mike; Jaitly, Navdeep; Yang, Zongheng; Chen, Zhifeng; Zhang, Yu; Wang, Yuxuan; Skerry-Ryan, RJ; Saurous, Rif A.; Agiomyrgiannakis, Yannis; Wu, Yonghui (2018). "Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions". arXiv:1712.05884 .
- Chung, Yu-An (2018). "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis". arXiv:1808.10128 .
- Ren, Yi (2019). "Almost Unsupervised Text to Speech and Automatic Speech Recognition". arXiv:1905.06791 .
- ^ "15.ai - FAQ". 15.ai. 2021-01-18. Retrieved 2021-01-18.
- Stewart, Matthew (2019-10-31). "The Most Important Court Decision For Data Science and Machine Learning". Towards Data Science. Archived from the original on 2022-02-21. Retrieved 2022-02-21.
- "Pony Preservation Project (Thread 108)". 4chan. Desuarchive. 2022-02-20. Retrieved 2022-02-20.
- "15.ai - Thanks". 15.ai. 2022-02-20. Retrieved 2022-02-20.
- "15.ai is creating natural emotive high-fidelity TTS with minimal viable data". Patreon. 2022-02-20. Archived from the original on 2022-02-18. Retrieved 2022-02-20.
- Williams, Demi (2022-01-18). "Voiceverse NFT admits to taking voice lines from non-commercial service". NME. NME. Archived from the original on 2022-01-18. Retrieved 2022-01-18.
- Wright, Steve (2022-01-17). "Troy Baker-backed NFT company admits to using content without permission". Stevivor. Archived from the original on 2022-01-17. Retrieved 2022-01-17.
- Henry, Joseph (2022-01-18). "Troy Baker's Partner NFT Company Voiceverse Reportedly Steals Voice Lines From 15.ai". Tech Times. Archived from the original on 2022-01-26. Retrieved 2022-02-14.
- Yea, Yong (2022-01-14). "Troy Baker Faces Mass Backlash For Supporting Shady AI Voice NFTs With Company That Has Stolen Work". YouTube. YouTube. Archived from the original on 2022-01-30. Retrieved 2022-01-30.
- Phillips, Tom (2022-01-17). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Eurogamer. Eurogamer. Archived from the original on 2022-01-17. Retrieved 2022-01-17.
- Phillips, Tom (2022-01-14). "Video game voice actor Troy Baker is now promoting NFTs". Eurogamer. Eurogamer. Archived from the original on 2022-01-14. Retrieved 2022-01-14.
- Strickland, Derek (2022-01-31). "Last of Us actor Troy Baker heeds fans, abandons NFT plans". Tweaktown. Archived from the original on 2022-01-31. Retrieved 2022-01-31.
- Peterson, Danny (2022-01-31). "'The Last of Us' actor Troy Baker reverses course on NFTs amid fan backlash". We Got This Covered. Archived from the original on 2022-02-14. Retrieved 2022-02-14.
- Ng, Andrew (2021-03-07). "Weekly Newsletter Issue 83". The Batch. The Batch. Archived from the original on 2022-02-26. Retrieved 2021-03-07.
- Tweets
- @fifteenai (January 14, 2022). "I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. After digging through the log files, I have evidence that some of the voices that they are taking credit for were indeed generated from my own site" (Tweet) – via Twitter.
{{cite web}}
: CS1 maint: url-status (link) - @VoiceverseNFT (January 14, 2022). "Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again" (Tweet) – via Twitter.
{{cite web}}
: CS1 maint: url-status (link) - @fifteenai (January 14, 2022). "Go fuck yourself" (Tweet) – via Twitter.
{{cite web}}
: CS1 maint: url-status (link)