Manuel Gentile and Fabrizio Falchi
Transformers are a neural network model designed to overcome the limitations of recurrent neural networks in the analysis of sequences of data (in our case, words or tokens)1.
Specifically, transformers, through the self-attention mechanism, make it possible to parallelise the analysis of data sequences and extract the dependencies between the elements of these sequences and the contexts in which they occur.
1 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I., Attention is all you need, Advances in neural information processing systems, 30, 2017.