The latest breakthrough of Amazon in artificial intelligence (AI) shook the world of technology with unveiling The largest speech text model. This colossal model developed by a team of researchers AI in Amazon Agi has an impressive parameters of 980 million and has been trained using a huge 100,000 hours of registered speech, mainly in English. This innovative model, called large adaptive TTS with emerging abilities (basic TTS), is a significant jump in the field of speech synthesis technology.
Let's break his most captivating functions:
Architecture
- 1 billion parameter autoregression transformer: his basic TTS core has a massive autoregression transformer. This neural network transforms a raw text into discreet codes known as “speech codes”.
- A weave -based decoder: After a Decoder speech codes, a weave based on a weave transforms them into real mileage. Beauty lies in its incremental, streaming approach, enabling real -time synthesis.
New approach to speech codes
- Speech tokens based on autoencoder: The basic TTS introduces a new speech tokenization technique. These speech tokens of speech identity dissertation and compress information using coding of bytes.
- Speaker ID: Imagine the TTS system that can smoothly imitate different speakers. The basic TTS achieves this by dissolving the features of raw sound speakers.
- Natural appearance of prose: repetition of the phenomenon visible in large language models, basic TTS variants by 10k+ hours and 500 m+ parameters begin to show natural prose even in complex sentences.
The most modern naturalness
- Naturalness of speech: The basic TTS sets a new reference point for naturalness. His output rivals publicly available on large -scale TTS systems, such as yours, bark and turtle.
- Complex words, emotions and punctuation: the basis of TTS supports complex vocabulary, pours emotions and punctuation of nails. It's not just robotic; This is expressive.
The most modern naturalness
- Data efficiency: Basic TTS shows that data efficiency can be built into large -scale models. This achieves unusual results with fewer hours of training.
- Streaminess: incremental, streaming approach opens the door to the application in real time in voice assistants, audiobooks and others.
The importance of the basic TTS is not only on the scale of the model itself, but also in its outgoing abilities – a phenomenon in which the use of AI has a sudden breakthrough of intelligence. Through strict tests, scientists found that this jump appeared at the parameter sign of 150 million, emphasizing the key role of the size of the data set in driving progress in AI's capabilities.
One of the most unusual features of the basic TTS model is its versatility in using various language attributes. From complex nouns complex to emotional expressions, pronunciation of a foreign language, and even nuances in intonation and punctuation, the model shows an impressive command regarding language complexities. In addition, his ability to properly emphasize keywords in the sentence and precisely asking questions adds another layer of sophistication to its functionality.
Although the basic TTS model will not be publicly available because of the ethical concerns about its potential improper use, the Amazon research team plans to use its teachings to increase the overall quality of text applications.
Nevertheless, you can now experience the convenience of the online text service for Qudat's speech! Enjoy our technology of synthesis of freedom of speech and convert the text written in a voice without effort.