15.ai: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 16:29, 9 April 2022 edit128.135.98.15 (talk) re-added the text about 15's usage of a multi-speaker modelTag: Manual revert ← Previous edit		Revision as of 18:14, 10 April 2022 edit undo128.135.98.48 (talk) updated logoTag: RevertedNext edit →
Line 2:		Line 2:
	{{Infobox software		{{Infobox software
	\| name = 15.ai		\| name = 15.ai
	\| logo =		\| logo = File:15 ai Transparent.png
	\| screenshot =		\| screenshot =
	\| caption =		\| caption =

Revision as of 18:14, 10 April 2022

Real-time text-to-speech tool using artificial intelligence

15.ai
File:15 ai Transparent.png
Developer(s)	15
Initial release	March 2020; 4 years ago (2020-03)

Stable release	v24.2.1 / September 2021; 3 years ago (2021-09)

Written in	Vue.js, Python, Julia
Available in	English
Type	Artificial intelligence, speech synthesis, machine learning, deep learning
Website	15.ai

15.ai is a non-commercial freeware artificial intelligence web application that generates natural emotive high-fidelity text-to-speech voices from an assortment of fictional characters from a variety of media sources. Developed by an anonymous MIT researcher under the eponymous pseudonym 15, the project uses a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate and serve emotive character voices faster than real-time, even those with a very small amount of data.

Features

Available characters include GLaDOS and Wheatley from Portal, characters from Team Fortress 2, Twilight Sparkle and a number of main, secondary, and supporting characters from My Little Pony: Friendship is Magic, SpongeBob from SpongeBob SquarePants, Daria Morgendorffer and Jane Lane from Daria, the Tenth Doctor from Doctor Who, HAL 9000 from 2001: A Space Odyssey, the Narrator from The Stanley Parable, the Wii U/3DS/Switch Super Smash Bros. Announcer (formerly), Sans from Undertale, and Carl Brutananadilewski from Aqua Teen Hunger Force.

The deep learning model used by the application is nondeterministic: each time that speech is generated from the same text string, the intonation of the speech will be slightly different. The application supports English phonetic transcriptions (such as ARPABET) to correct mispronunciations or to account for heteronyms—words that are spelled the same but are pronounced differently (such as the word read, which can be pronounced as either /ˈrɛd/ or /ˈriːd/ depending on its tense). The app also supports altering the emotion of a generated line using emotional contextualizers (a term coined by this project), a sentence or phrase that conveys the emotion of the take that serves as a guide for the model during inference.

15.ai uses a multi-speaker model—hundreds of characters are trained concurrently rather than sequentially, significantly reducing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to such emotional context. The lexicon used by 15.ai was scraped from a variety of Internet sources, including Oxford Dictionaries, Wiktionary, the CMU Pronouncing Dictionary, 4chan, Reddit, and Twitter. Pronunciations of unfamiliar words are automatically deduced using phonological rules learned by the deep learning model.

Background

Speech synthesis

Main article: Deep learning speech synthesis

In 2016, with the proposal of DeepMind's WaveNet, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech. Tacotron2, a neural network architecture for speech synthesis developed by Google AI, was published in 2018 and required tens of hours of audio data to train. For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.

The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.

Copyrighted material in deep learning

Main article: Authors Guild, Inc. v. Google, Inc.

A landmark case between Google and the Authors Guild in 2013 ruled that Google Books—a service that searches the full text of printed copyrighted books—met all requirements for fair use. This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a discriminative model or a non-commercial generative model was deemed legal.

Development

15.ai was designed and created by an anonymous research scientist affiliated with the Massachusetts Institute of Technology known by the alias 15, ostensibly in reference to the minimum amount of data required to convincingly clone a voice. The project began development while the developer was an undergraduate. Although the application costs several thousands of dollars a month to keep up and maintain, the developer has stated that they are capable of paying the high cost of running the site out of pocket.

The algorithm used by the project to facilitate the cloning of voices with minimal viable data has been dubbed DeepThroat (a double entendre in reference to speech synthesis using deep neural networks and the sexual act of deep-throating). The project and algorithm—initially conceived as part of MIT's Undergraduate Research Opportunities Program—had been in development for years before the first release of the application.

The developer has also worked closely with the Pony Preservation Project from /mlp/, the My Little Pony board of 4chan. The Pony Preservation Project is a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence. According to the developer, the collective efforts and constructive criticism from the Pony Preservation Project has been integral to the development of 15.ai. During months-long hiatuses of the public release of the site in late 2020 and early 2021, a number of test sites were put up for exclusive use by /mlp/ and the Pony Preservation Project for the purposes of testing and generating content.

Reception

15 @fifteenai

I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. After digging through the log files, I have evidence that some of the voices that they are taking credit for were indeed generated from my own site.
January 14, 2022

Voiceverse Origins @VoiceverseNFT

Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again.
January 14, 2022

15 @fifteenai

Go fuck yourself.
January 14, 2022

Over 3 million lines were generated within the first two weeks of the September 2021 release of the application. As of February 2022, 15.ai's Patreon was raising over $1,100 per month.

Fandom content creation

15.ai has been frequently used for content creation in various fandoms, including the My Little Pony: Friendship Is Magic fandom, the Team Fortress 2 fandom, the Portal fandom, and the SpongeBob SquarePants fandom. Numerous videos/projects that have 15.ai-synthesized speech in them have been made and gone viral. However, numerous videos/projects that have non-15.ai-synthesized speech in them have been made and gone viral too—some of them without having properly credited the source(s) of their AI-synthesized voice clips. As a consequence, many videos/projects that have been made with other speech synthesis software have been mistaken as being made with 15.ai, and vice versa. Due to this misattribution and absence of proper credit, 15.ai's terms of service has a rule that forbids having 15.ai-and-non-15.ai-synthesized speech in the same videos/projects.

The My Little Pony: Friendship Is Magic fandom has seen a resurgence in video and musical content creation as a direct result; moreover, the project has been utilized as a creative tool in pornography. For instance, the Pony Zone videos is a series of erotic musical videos that heavily samples 15.ai as the vocals—the creators of such videos make frequent use of salacious emotional contextualizers and punctuation/ARPABET tricks to induce the models to grunt, sigh, and moan convincingly.

Troy Baker / VoiceVerse NFT scandal

On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game voice actor Troy Baker announced his partnership with, had stolen voice lines generated from 15.ai as part of their marketing campaign without permission. Log files showed that Voiceverse had generated audio of Twilight Sparkle and Rainbow Dash from the show My Little Pony: Friendship Is Magic using 15.ai, pitched them up to make them sound unrecognizable from the original voices, and appropriated them without proper credit to falsely market their own platform—a violation of 15.ai's terms of service.

The initial partnership between Troy Baker and Voiceverse was met with severe backlash and universally negative reception. Critics highlighted the potential environmental impact of and potential for exit scams associated with NFT sales. Two weeks later, on January 31, Baker announced that he would discontinue his partnership with Voiceverse.

Resistance from voice actors

Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about impersonation and fraud, unauthorized use of an actor's voice in pornography, and the potential of AI being used to make voice actors obsolete.