Misplaced Pages

15.ai: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 21:12, 9 June 2022 editHackerKnownAs (talk | contribs)478 editsm Forgot to add the symbol for secondary stress in IPA← Previous edit Revision as of 05:22, 10 June 2022 edit undoHackerKnownAs (talk | contribs)478 edits Replaced unsourced and outdated claims in reception section with actual reviews from published articlesNext edit →
Line 546: Line 546:
In 2016, with the proposal of ]'s ], deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech.<ref name="arxiv1">{{cite arXiv |last=Hsu |first=Wei-Ning |eprint=1810.07217 |title=Hierarchical Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref name="arxiv2">{{cite arXiv |last=Habib |first=Raza |eprint=1910.01709 |title=Semi-Supervised Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2019 }}</ref><ref name="deepmind">{{cite web|url=https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|title=High-fidelity speech synthesis with WaveNet|last1=van den Oord|first1=Aäron|last2=Li|first2=Yazhe|last3=Babuschkin|first3=Igor|date=2017-11-12|website=]|access-date=2022-06-05}}</ref><ref name="thebatch"/> Tacotron2, a neural network architecture for speech synthesis developed by ], was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.<ref name="tacotron">{{cite web|url=https://google.github.io/tacotron/publications/semisupervised/index.html|title=Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"|date=2018-08-30|access-date=2022-06-05}}</ref><ref name="arxiv3">{{cite arXiv |eprint=1712.05884 |title=Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions |class=cs.CL |date=2018 |last1=Shen |first1=Jonathan |last2=Pang |first2=Ruoming |last3=Weiss |first3=Ron J. |last4=Schuster |first4=Mike |last5=Jaitly |first5=Navdeep |last6=Yang |first6=Zongheng |last7=Chen |first7=Zhifeng |last8=Zhang |first8=Yu |last9=Wang |first9=Yuxuan |last10=Skerry-Ryan |first10=RJ |last11=Saurous |first11=Rif A. |last12=Agiomyrgiannakis |first12=Yannis |last13=Wu |first13=Yonghui }}</ref> For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.<ref>{{cite arXiv |last=Chung |first=Yu-An |eprint=1808.10128 |title=Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref>{{cite arXiv |last=Ren |first=Yi |eprint=1905.06791 |title=Almost Unsupervised Text to Speech and Automatic Speech Recognition |class=cs.CL |date=2019 }}</ref> In 2016, with the proposal of ]'s ], deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech.<ref name="arxiv1">{{cite arXiv |last=Hsu |first=Wei-Ning |eprint=1810.07217 |title=Hierarchical Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref name="arxiv2">{{cite arXiv |last=Habib |first=Raza |eprint=1910.01709 |title=Semi-Supervised Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2019 }}</ref><ref name="deepmind">{{cite web|url=https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|title=High-fidelity speech synthesis with WaveNet|last1=van den Oord|first1=Aäron|last2=Li|first2=Yazhe|last3=Babuschkin|first3=Igor|date=2017-11-12|website=]|access-date=2022-06-05}}</ref><ref name="thebatch"/> Tacotron2, a neural network architecture for speech synthesis developed by ], was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.<ref name="tacotron">{{cite web|url=https://google.github.io/tacotron/publications/semisupervised/index.html|title=Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"|date=2018-08-30|access-date=2022-06-05}}</ref><ref name="arxiv3">{{cite arXiv |eprint=1712.05884 |title=Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions |class=cs.CL |date=2018 |last1=Shen |first1=Jonathan |last2=Pang |first2=Ruoming |last3=Weiss |first3=Ron J. |last4=Schuster |first4=Mike |last5=Jaitly |first5=Navdeep |last6=Yang |first6=Zongheng |last7=Chen |first7=Zhifeng |last8=Zhang |first8=Yu |last9=Wang |first9=Yuxuan |last10=Skerry-Ryan |first10=RJ |last11=Saurous |first11=Rif A. |last12=Agiomyrgiannakis |first12=Yannis |last13=Wu |first13=Yonghui }}</ref> For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.<ref>{{cite arXiv |last=Chung |first=Yu-An |eprint=1808.10128 |title=Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref>{{cite arXiv |last=Ren |first=Yi |eprint=1905.06791 |title=Almost Unsupervised Text to Speech and Automatic Speech Recognition |class=cs.CL |date=2019 }}</ref>


The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.<ref name="towardds"/> The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.<ref name="towardds"/><ref name="thebatch"/>


=== Copyrighted material in deep learning === === Copyrighted material in deep learning ===
Line 645: Line 645:
}} }}


15.ai has been met with largely positive reviews. Liana Ruppert of '']'' called 15.ai "simplistically brilliant."<ref name="gameinformer"/> Lauren Morton of '']'' and Natalia Clayton of '']'' called it "fascinating,"<ref name="rockpapershotgun"/><ref name="pcgamer"/> while José Villalobos of '']'' wrote that it "works as easy as it looks."<ref name="LaPS4"/> Users praised the ability to easily create audio of popular characters that sound believable to those who are unaware that the voices were generated by artificial intelligence: Zack Zwiezen of '']'' reported that " girlfriend was convinced it was a new voice line from ]' voice actor, ],"<ref name="kotaku"/> while Rionaldi Chandraseta of '']'' wrote that, upon watching a ] video featuring a character voice's generated by 15.ai, " first thought was the video creator used ] to pay for new dialogues from the original voice actors" and stated that "the quality of voices done by 15.ai is miles ahead of ."<ref name="towardds"/>
Over 3 million lines were generated within the first two weeks of the September 2021 release of the application. {{As of|February 2022}}, 15.ai's ] was raising over $1,100 per month.<ref name="patreon">{{cite web

|url= https://www.patreon.com/15ai
Computer scientist and technology entrepreneur ] wrote in his newsletter ''The Batch'' that the technology behind 15.ai could be "enormously productive" and could "revolutionize the use of virtual actors"; however, he also noted that "synthesizing a human actor’s voice without consent is arguably unethical and possibly illegal" and could potentially open up to cases of ].<ref name="thebatch"/>
|title= 15.ai is creating natural emotive high-fidelity TTS with minimal viable data
|date= 2022-02-20
|website= ]
|access-date= 2022-02-20
|quote=
|archive-date= 2022-02-18
|archive-url= https://web.archive.org/web/20220218205737/https://www.patreon.com/15ai
|url-status= live
}}</ref>


=== Fandom content creation === === Fandom content creation ===
Line 818: Line 810:
|archive-url= https://web.archive.org/web/20220214191046/https://wegotthiscovered.com/gaming/the-last-of-us-actor-troy-baker-reverses-course-on-nfts-amid-fan-backlash/ |archive-url= https://web.archive.org/web/20220214191046/https://wegotthiscovered.com/gaming/the-last-of-us-actor-troy-baker-reverses-course-on-nfts-amid-fan-backlash/
|url-status= live |url-status= live
}}</ref><ref>{{Cite web|last=Peters|first=Jay|date=2022-01-31|title=The voice of Joel from The Last of Us steps away from NFT project after outcry|url=https://www.theverge.com/2022/1/31/22910633/troy-baker-voiceverse-nft-voice-actor-project-the-last-of-us|access-date=2022-02-04|website=The Verge|language=en}}</ref>
}}</ref>


===Resistance from voice actors=== ===Resistance from voice actors===

Revision as of 05:22, 10 June 2022

Real-time text-to-speech tool using artificial intelligence
15.ai
File:15 ai logo transparent.png
Developer(s)15
Initial releaseMarch 2020; 4 years ago (2020-03)
Stable releasev24.2.1 / September 2021; 3 years ago (2021-09)
Written inVue.js, Python, Julia
Available inEnglish
TypeArtificial intelligence, speech synthesis, machine learning, deep learning
Website15.ai
Part of a series on
Artificial intelligence
Major goals
Approaches
Applications
Philosophy
History
Glossary

15.ai is a non-commercial freeware artificial intelligence web application that generates natural emotive high-fidelity text-to-speech voices from an assortment of fictional characters from a variety of media sources. Developed by an anonymous MIT researcher under the eponymous pseudonym 15, the project uses a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate and serve emotive character voices faster than real-time, even those with a very small amount of data.

Launched in early 2020, 15.ai began as a proof of concept of the democratization of voice acting and dubbing using technology. Its gratis and non-commercial nature (with the only stipulation being that the project be properly credited when used), ease of use, and substantial improvements to current text-to-speech implementations have been lauded by users; however, some critics and voice actors have questioned the legality and ethicality of leaving such technology publicly available and readily accessible.

Several commercial alternatives have spawned with the rising popularity of 15.ai, leading to cases of misattribution and theft. In January 2022, it was discovered that Voiceverse NFT, a company that voice actor Troy Baker announced his partnership with, had plagiarized 15.ai's work as part of their platform.

Features

Available characters include GLaDOS and Wheatley from Portal, characters from Team Fortress 2, Twilight Sparkle and a number of main, secondary, and supporting characters from My Little Pony: Friendship is Magic, SpongeBob from SpongeBob SquarePants, Daria Morgendorffer and Jane Lane from Daria, the Tenth Doctor from Doctor Who, HAL 9000 from 2001: A Space Odyssey, the Narrator from The Stanley Parable, the Wii U/3DS/Switch Super Smash Bros. Announcer (formerly), Sans from Undertale, and Carl Brutananadilewski from Aqua Teen Hunger Force.

Emotional contextualizers (represented as a distribution of emojis) guide the AI into delivering lines with different emotions.

The deep learning model used by the application is nondeterministic: each time that speech is generated from the same string of text, the intonation of the speech will be slightly different. The application also supports manually altering the emotion of a generated line using emotional contextualizers (a term coined by this project), a sentence or phrase that conveys the emotion of the take that serves as a guide for the model during inference. Emotional contextualizers are representations of the emotional content of a sentence deduced via transfer learned emoji embeddings using DeepMoji, a deep neural network sentiment analysis algorithm developed by the MIT Media Lab in 2017. DeepMoji was trained on 1.2 billion emoji occurrences in Twitter data from 2013 to 2017, and has been found to outperform human subjects in correctly identifying sarcasm in Tweets and other online modes of communication.

15.ai uses a multi-speaker model—hundreds of voices are trained concurrently rather than sequentially, significantly reducing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to such emotional context. Consequently, the entire lineup of characters in the application is powered by a single trained model, as opposed to multiple single-speaker models trained on different datasets. The lexicon used by 15.ai has been scraped from a variety of Internet sources, including Oxford Dictionaries, Wiktionary, the CMU Pronouncing Dictionary, 4chan, Reddit, and Twitter. Pronunciations of unfamiliar words are automatically deduced using phonological rules learned by the deep learning model.

The application supports a simplified version of a set of English phonetic transcriptions known as ARPABET to correct mispronunciations or to account for heteronyms—words that are spelled the same but are pronounced differently (such as the word read, which can be pronounced as either /ˈrɛd/ or /ˈriːd/ depending on its tense). While the original ARPABET codes developed in the 1970s by the Advanced Research Projects Agency supports 50 unique symbols to designate and differentiate between English phonemes, the CMU Pronouncing Dictionary's ARPABET convention (the set of transcription codes followed by 15.ai) reduces the symbol set to 39 phonemes by eliminating phonemes not regularly found in American English (e.g. Q; NX; DX), combining allophonic phonetic realizations into a single standard (e.g. AXR/ER; UX/UW), and using multiple common symbols together to replace syllabic consonants (e.g. EN/AH0 N). ARPABET strings can be invoked in the application by wrapping the string of phonemes in curly braces within the input box (e.g. {AA1 R P AH0 B EH2 T} to denote /ˈɑːrpəˌbɛt/, the pronunciation of the word ARPABET).

The following is a table of phonemes used by 15.ai and the CMU Pronouncing Dictionary:

Vowels
ARPABET Rspl. IPA Example
AA ah ɑ odd
AE a æ at
AH0 ə ə about
AH u, uh ʌ hut
AO aw ɔ ought
AW ow cow
AY eye hide
EH e, eh ɛ Ed
Vowels
ARPABET Rspl. IPA Example
ER ur, ər ɝ, ɚ hurt
EY ay ate
IH i, ih ɪ it
IY ee i eat
OW oh oat
OY oy ɔɪ toy
UH uu ʊ hood
UW oo u two
Stress
AB Description
0 No stress
1 Primary stress
2 Secondary stress
Consonants
ARPABET Rspl. IPA Example
B b b be
CH ch, tch cheese
D d d dee
DH dh ð thee
F f f fee
G g ɡ green
HH h h he
JH j gee
Consonants
ARPABET Rspl. IPA Example
K k k key
L l l lee
M m m me
N n n knee
NG ng ŋ ping
P p p pee
R r r read
S s, ss s sea
Consonants
ARPABET Rspl. IPA Example
SH sh ʃ she
T t t tea
TH th θ theta
V v v vee
W w, wh w we
Y y j yield
Z z z zee
ZH zh ʒ seizure

Background

Speech synthesis

Main article: Deep learning speech synthesis
A stack of dilated casual convolutional layers used in DeepMind's WaveNet.

In 2016, with the proposal of DeepMind's WaveNet, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech. Tacotron2, a neural network architecture for speech synthesis developed by Google AI, was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech. For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.

The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.

Copyrighted material in deep learning

Main article: Authors Guild, Inc. v. Google, Inc.

A landmark case between Google and the Authors Guild in 2013 ruled that Google Books—a service that searches the full text of printed copyrighted books—was transformative, thus meeting all requirements for fair use. This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a discriminative model or a non-commercial generative model—as is the case with 15.ai—was deemed legal. The legality of commercial generative models trained using copyrighted material is still under debate; due to the black-box nature of machine learning models, any allegations of copyright infringement via direct competition would be difficult to prove.

Development

File:4cc-mlp.png
The Pony Preservation Project from 4chan's /mlp/ board has been integral to the development of 15.ai.

15.ai was designed and created by an anonymous research scientist affiliated with the Massachusetts Institute of Technology known by the alias 15. The project began development while the developer was an undergraduate. Although the application costs several thousands of dollars a month to keep up and maintain, the developer has stated that they are capable of paying the high cost of running the site out of pocket.

The algorithm used by the project to facilitate the cloning of voices with minimal viable data has been dubbed DeepThroat (a double entendre in reference to speech synthesis using deep neural networks and the sexual act of deep-throating). The project and algorithm—initially conceived as part of MIT's Undergraduate Research Opportunities Program—had been in development for years before the first release of the application.

The developer has also worked closely with the Pony Preservation Project from /mlp/, the My Little Pony board of 4chan. The Pony Preservation Project is a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence. According to the developer, the collective efforts and constructive criticism from the Pony Preservation Project has been integral to the development of 15.ai.

Reception

Avatar of 15
Avatar of 15
15 @fifteenai Twitter logo, a stylized blue bird

I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. After digging through the log files, I have evidence that some of the voices that they are taking credit for were indeed generated from my own site.

January 14, 2022
Avatar of Voiceverse Origins
Avatar of Voiceverse Origins
Voiceverse Origins @VoiceverseNFT Twitter logo, a stylized blue bird

Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again.

January 14, 2022
Avatar of 15
Avatar of 15
15 @fifteenai Twitter logo, a stylized blue bird

Go fuck yourself.

January 14, 2022

15.ai has been met with largely positive reviews. Liana Ruppert of Game Informer called 15.ai "simplistically brilliant." Lauren Morton of Rock, Paper, Shotgun and Natalia Clayton of PCGamer called it "fascinating," while José Villalobos of LaPS4 wrote that it "works as easy as it looks." Users praised the ability to easily create audio of popular characters that sound believable to those who are unaware that the voices were generated by artificial intelligence: Zack Zwiezen of Kotaku reported that " girlfriend was convinced it was a new voice line from GLaDOS' voice actor, Ellen McLain," while Rionaldi Chandraseta of Towards Data Science wrote that, upon watching a YouTube video featuring a character voice's generated by 15.ai, " first thought was the video creator used cameo.com to pay for new dialogues from the original voice actors" and stated that "the quality of voices done by 15.ai is miles ahead of ."

Computer scientist and technology entrepreneur Andrew Ng wrote in his newsletter The Batch that the technology behind 15.ai could be "enormously productive" and could "revolutionize the use of virtual actors"; however, he also noted that "synthesizing a human actor’s voice without consent is arguably unethical and possibly illegal" and could potentially open up to cases of impersonation and fraud.

Fandom content creation

15.ai has been frequently used for content creation in various fandoms, including the My Little Pony: Friendship Is Magic fandom, the Team Fortress 2 fandom, the Portal fandom, and the SpongeBob SquarePants fandom. Numerous videos and projects containing audio from 15.ai have gone viral. However, some videos and projects that contain non-15.ai-generated speech audio have also gone viral, many of which do not properly credit the source(s) of the synthetic speech audio featured in them. As a consequence, many videos and projects that have been made with other speech synthesis software have been mistaken as being made with 15.ai, and vice versa. Due to this misattribution and absence of proper credit, 15.ai's terms of service has a rule that forbids having 15.ai-and-non-15.ai-generated speech audio in the same videos and projects.

The My Little Pony: Friendship Is Magic fandom has seen a resurgence in video and musical content creation as a direct result, inspiring a new genre of fan-created content assisted by artificial intelligence. Some fanfiction have been adapted into fully voiced "episodes": The Tax Breaks is a 17-minute long animated video rendition of a fan-written story published in 2014 that uses voices generated from 15.ai, emulating the episodic style of the early seasons of My Little Pony: Friendship Is Magic.

Troy Baker / Voiceverse NFT plagiarism scandal

See also: Non-fungible token § Plagiarism and fraud
Avatar of Troy Baker
Avatar of Troy Baker
Troy Baker @TroyBakerVA Twitter logo, a stylized blue bird

I’m partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP’s they create. We all have a story to tell. You can hate. Or you can create. What'll it be?

January 14, 2022

In December 2021, the developer of 15.ai posted on Twitter that they had no interest in incorporating non-fungible tokens (NFTs) into their work.

On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game and anime dub voice actor Troy Baker announced his partnership with, had stolen voice lines generated from 15.ai as part of their marketing campaign without permission. Log files showed that Voiceverse had generated audio of Twilight Sparkle and Rainbow Dash from the show My Little Pony: Friendship Is Magic using 15.ai, pitched them up to make them sound unrecognizable from the original voices, and appropriated them without proper credit to falsely market their own platform—a violation of 15.ai's terms of service.

A week prior to the announcement of the partnership with Baker, Voiceverse made a (now-deleted) Twitter post directly responding to a (now-deleted) video posted by Chubbiverse—an NFT platform with which Voiceverse had partnered—showcasing an AI-generated voice and claimed that it was generated using Voiceverse's platform, remarking "I wonder who created the voice for this? ;)" A few hours after news of the partnership broke, the developer of 15.ai—having been alerted by another Twitter user asking for his opinion on the partnership, to which he speculated that it "sounds like a scam"—posted screenshots of log files that proved that a user of the website (with their IP address redacted) had submitted inputs of the exact words spoken by the AI voice in the video posted by Chubbiverse, and subsequently responded to Voiceverse's claim directly, tweeting "Certainly not you :)".

Following the tweet, Voiceverse admitted to plagiarizing voices from 15.ai as their own platform, claiming that their marketing team had used the project without giving proper credit and that the "Chubbiverse team no knowledge of this." In response to the admission, 15 tweeted "Go fuck yourself." The final tweet went viral, accruing over 75,000 total likes and 13,000 total retweets across multiple reposts.

The initial partnership between Baker and Voiceverse was met with severe backlash and universally negative reception. Critics highlighted the environmental impact of and potential for exit scams associated with NFT sales. Commentators also pointed out the irony in Baker's initial Tweet announcing the partnership, which ended with "You can hate. Or you can create. What'll it be?", hours before the public revelation that the company in question had resorted to theft instead of creating their own product. Baker responded that he appreciated people sharing their thoughts and their responses were "giving a lot to think about." He also acknowledged that the "hate/create" part in his initial Tweet might have been "a bit antagonistic," and asked fans on social media to forgive him. Two weeks later, on January 31, Baker announced that he would discontinue his partnership with Voiceverse.

Resistance from voice actors

Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about impersonation and fraud, unauthorized use of an actor's voice in pornography, and the potential of AI being used to make voice actors obsolete.

See also

References

Notes
  1. ^ Chandraseta, Rionaldi (2021-01-19). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on 2021-01-21. Retrieved 2021-01-23.
  2. ^ Zwiezen, Zack (2021-01-18). "Website Lets You Make GLaDOS Say Whatever You Want". Kotaku. Kotaku. Archived from the original on 2021-01-17. Retrieved 2021-01-18.
  3. ^ Ruppert, Liana (2021-01-18). "Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App". Game Informer. Game Informer. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
  4. ^ Clayton, Natalie (2021-01-19). "Make the cast of TF2 recite old memes with this AI text-to-speech tool". PC Gamer. PC Gamer. Archived from the original on 2021-01-19. Retrieved 2021-01-19.
  5. ^ Morton, Lauren (2021-01-18). "Put words in game characters' mouths with this fascinating text to speech tool". Rock, Paper, Shotgun. Rock, Paper, Shotgun. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
  6. ^ Ng, Andrew (2020-04-01). "Voice Cloning for the Masses". deeplearning.ai. The Batch. Archived from the original on 2020-04-08. Retrieved 2020-04-05. {{cite web}}: |archive-date= / |archive-url= timestamp mismatch; 2020-08-07 suggested (help)
  7. ^ "15.ai - FAQ". 15.ai. 2021-01-18. Retrieved 2021-01-18.
  8. ^ Ng, Andrew (2021-03-07). "Weekly Newsletter Issue 83". The Batch. The Batch. Archived from the original on 2022-02-26. Retrieved 2021-03-07.
  9. ^ Lopez, Ule (2022-01-16). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Wccftech. Wccftech. Retrieved 2022-06-07.{{cite web}}: CS1 maint: url-status (link)
  10. ^ Williams, Demi (2022-01-18). "Voiceverse NFT admits to taking voice lines from non-commercial service". NME. NME. Archived from the original on 2022-01-18. Retrieved 2022-01-18.
  11. ^ Wright, Steve (2022-01-17). "Troy Baker-backed NFT company admits to using content without permission". Stevivor. Archived from the original on 2022-01-17. Retrieved 2022-01-17.
  12. ^ Henry, Joseph (2022-01-18). "Troy Baker's Partner NFT Company Voiceverse Reportedly Steals Voice Lines From 15.ai". Tech Times. Archived from the original on 2022-01-26. Retrieved 2022-02-14.
  13. ^ Yea, Yong (2022-01-14). "Troy Baker Faces Mass Backlash For Supporting Shady AI Voice NFTs With Company That Has Stolen Work". YouTube. YouTube. Archived from the original on 2022-01-30. Retrieved 2022-01-30.
  14. Yoshiyuki, Furushima (2021-01-18). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に". Denfaminicogamer. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
  15. ^ Kurosawa, Yuki (2021-01-19). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる". AUTOMATON. AUTOMATON. Archived from the original on 2021-01-19. Retrieved 2021-01-19.
  16. ^ Villalobos, José (2021-01-18). "Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras". LaPS4. LaPS4. Archived from the original on 2021-01-18. Retrieved 2021-01-18.
  17. Moto, Eugenio (2021-01-20). "15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras". Yahoo! Finance. Yahoo! Finance. Archived from the original on 2022-03-08. Retrieved 2021-01-20.
  18. Felbo, Bjarke (2017). "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm". arXiv:1708.00524 .
  19. Corfield, Gareth (2017-08-07). "A sarcasm detector bot? That sounds absolutely brilliant. Definitely". The Register. The Register. Retrieved 2022-06-02.
  20. "An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter". MIT Technology Review. MIT Technology Review. 2017-08-03. Retrieved 2022-06-02.
  21. "Emojis help software spot emotion and sarcasm". BBC. BBC. 2017-08-07. Retrieved 2022-06-02.
  22. Lowe, Josh (2017-08-07). "Emoji-Filled Mean Tweets Help Scientists Create Sarcasm-Detecting Bot That Could Uncover Hate Speech". Newsweek. Newsweek. Retrieved 2022-06-02.
  23. Valle, Rafael (2020). "Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens". arXiv:1910.11997 .
  24. Cooper, Erica (2020). "Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings". arXiv:1910.10838 .
  25. ^ "15.ai - About". 15.ai. 2022-02-20. Retrieved 2022-02-20.
  26. Klautau, Aldebaro (2001). "ARPABET and the TIMIT alphabet" (PDF). Archived from the original (PDF) on June 3, 2016. Retrieved September 8, 2017.
  27. ^ "The CMU Pronouncing Dictionary". CMU Pronouncing Dictionary. CMU Pronouncing Dictionary. 2015-07-16. Archived from the original on 2022-06-03. Retrieved 2022-06-04.
  28. ^ van den Oord, Aäron; Li, Yazhe; Babuschkin, Igor (2017-11-12). "High-fidelity speech synthesis with WaveNet". DeepMind. Retrieved 2022-06-05.
  29. Hsu, Wei-Ning (2018). "Hierarchical Generative Modeling for Controllable Speech Synthesis". arXiv:1810.07217 .
  30. Habib, Raza (2019). "Semi-Supervised Generative Modeling for Controllable Speech Synthesis". arXiv:1910.01709 .
  31. "Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". 2018-08-30. Retrieved 2022-06-05.
  32. Shen, Jonathan; Pang, Ruoming; Weiss, Ron J.; Schuster, Mike; Jaitly, Navdeep; Yang, Zongheng; Chen, Zhifeng; Zhang, Yu; Wang, Yuxuan; Skerry-Ryan, RJ; Saurous, Rif A.; Agiomyrgiannakis, Yannis; Wu, Yonghui (2018). "Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions". arXiv:1712.05884 .
  33. Chung, Yu-An (2018). "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis". arXiv:1808.10128 .
  34. Ren, Yi (2019). "Almost Unsupervised Text to Speech and Automatic Speech Recognition". arXiv:1905.06791 .
  35. - F.2d – (2d Cir, 2015). (temporary cites: 2015 U.S. App. LEXIS 17988; Slip opinion (October 16, 2015))
  36. ^ Stewart, Matthew (2019-10-31). "The Most Important Court Decision For Data Science and Machine Learning". Towards Data Science. Archived from the original on 2022-02-21. Retrieved 2022-02-21.
  37. "15". Twitter. 2022-06-09. Retrieved 2022-06-09.
  38. "Pony Preservation Project (Thread 108)". 4chan. Desuarchive. 2022-02-20. Retrieved 2022-02-20.
  39. "15.ai - Thanks". 15.ai. 2022-02-20. Retrieved 2022-02-20.
  40. Scotellaro, Shaun (2022-05-15). "Full Simple Animated Episode - The Tax Breaks (Twilight)". Equestria Daily. Equestria Daily. Retrieved 2022-05-28. {{cite web}}: |archive-date= requires |archive-url= (help)CS1 maint: url-status (link)
  41. "The Terribly Taxing Tribulations of Twilight Sparkle". FimFiction.net. FimFiction.net. 2014-04-27. Retrieved 2022-05-28. {{cite web}}: |archive-date= requires |archive-url= (help)CS1 maint: url-status (link)
  42. ^ Phillips, Tom (2022-01-17). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Eurogamer. Eurogamer. Archived from the original on 2022-01-17. Retrieved 2022-01-17.
  43. Phillips, Tom (2022-01-14). "Video game voice actor Troy Baker is now promoting NFTs". Eurogamer. Eurogamer. Archived from the original on 2022-01-14. Retrieved 2022-01-14.
  44. McWhertor, Michael (2022-01-14). "The Last of Us voice actor wants to sell 'voice NFTs,' drawing ire". Polygon. Retrieved 2022-01-14.
  45. "Last Of Us Voice Actor Pisses Everyone Off With NFT Push". Kotaku. January 14, 2022. Retrieved 2022-01-14.
  46. Purslow, Matt (2022-01-14). "Troy Baker Is Working With NFTs, but Fans Are Unimpressed". IGN. Retrieved 2022-01-14.
  47. Strickland, Derek (2022-01-31). "Last of Us actor Troy Baker heeds fans, abandons NFT plans". Tweaktown. Archived from the original on 2022-01-31. Retrieved 2022-01-31.
  48. Peterson, Danny (2022-01-31). "'The Last of Us' actor Troy Baker reverses course on NFTs amid fan backlash". We Got This Covered. Archived from the original on 2022-02-14. Retrieved 2022-02-14.
  49. Peters, Jay (2022-01-31). "The voice of Joel from The Last of Us steps away from NFT project after outcry". The Verge. Retrieved 2022-02-04.
Tweets
  1. @fifteenai (January 14, 2022). "I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. After digging through the log files, I have evidence that some of the voices that they are taking credit for were indeed generated from my own site" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  2. @VoiceverseNFT (January 14, 2022). "Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  3. @fifteenai (January 14, 2022). "Go fuck yourself" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  4. @TroyBakerVA (January 14, 2022). "I'm partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create. We all have a story to tell. You can hate. Or you can create. What'll it be?" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  5. @fifteenai (December 12, 2021). "I have no interest in incorporating NFTs into any aspect of my work. Please stop asking" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  6. @VoiceverseNFT (January 7, 2022). "I wonder who created the voice for this? ;)" (Tweet). Archived from the original on January 15, 2022 – via Twitter.
  7. @fifteenai (January 14, 2022). "Sounds like a scam" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  8. @fifteenai (January 14, 2022). "Give proper credit or remove this post" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  9. @fifteenai (January 14, 2022). "Certainly not you :)" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  10. @fifteenai (January 14, 2022). "Go fuck yourself" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  11. @yongyea (January 14, 2022). "The NFT scheme that Troy Baker is promoting is already finding itself in trouble after stealing and profiting off of somebody else's work. Who could've seen this coming" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)
  12. @BronyStruggle (January 15, 2022). "actual" (Tweet) – via Twitter.{{cite web}}: CS1 maint: url-status (link)

External links

Speech synthesis
Free software
Speaking
Singing
Proprietary
software
Speaking
Singing
Machine
Applications
Protocols
Developers/
Researchers
Process
My Little Pony (2010–2021)
Equestria
Friendship Is Magic
(2010–2019)
Episodes
Season 1 (2010–2011)
"Friendship Is Magic"
"The Cutie Mark Chronicles"
"The Best Night Ever"
Season 2 (2011–2012)
"The Return of Harmony"
"Hearts and Hooves Day"
"A Canterlot Wedding"
Season 3 (2012–2013)
"The Crystal Empire"
"One Bad Apple"
"Magic Duel"
"Spike at Your Service"
"Keep Calm and Flutter On"
"Games Ponies Play"
"Magical Mystery Cure"
Season 4 (2013–2014)
"Princess Twilight Sparkle"
"Power Ponies"
"Three's a Crowd"
"Pinkie Pride"
"Filli Vanilli"
"Twilight's Kingdom"
Season 5 (2015)
"The Cutie Map"
"Slice of Life"
"Amending Fences"
"Crusaders of the Lost Mark"
"The Cutie Re-Mark"
Season 6 (2016)
"A Hearth's Warming Tail"
Season 7 (2017)
"The Perfect Pear"
Season 8 (2018)
"Grannies Gone Wild"
Season 9 (2019)
"The Last Crusade"
Finale
My Little Pony: The Movie
(2017)
Other series
Games
Comics
Fandom
See alsoMy Little Pony: Equestria Girls
Categories: