Author: Marcello Politi
Originally published in the direction of artificial intelligence.
A gentle immersion in how attention helps the neural network is better to remember and not forget
The mechanism of attention is often associated with the transformer architecture, but was already used in RNN. In the tasks of machine translation or MT (e.g. English-Italian), when you want to predict the next Italian word, you need your model to focus or pay attention to the most important English words that are useful for good translation.
I will not go into the details of the RNN, but attention helped these models to alleviate the problem of the disappearing gradient and capture more relationships between words.
At one point we understood that the only important thing was the attention mechanism, and the entire architecture of the RNN was exaggerated. Hence, attention is all you need!
The classic note indicates where the words in the output sequence should focus attention in relation to words in the input sequence. This is important in the tasks of the sequence to the sequence, such as MT.
Self -understanding is a specific type of attention. It works between two elements in the same sequence. It provides information on how “correlated” words are in the same sentence.
In the case of a given token (or word) in the sequence, self -confidence generates a attention list that corresponds to all other tokens in the sequence. This … Read the full blog for free on the medium.
Published via AI