37
Manuel Gentile and Fabrizio Falchi
The great popularity achieved in a short timeframe by recent natural language dialogue systems (such as ChatGPT, Bard and LLAMa2-chat), in their utilisation of large language models, has led to the emergence of heated debates that are still open on several aspects. It is undoubtedly fascinating to question how a computational system, governed by relatively simple mathematical equations, is able to generate behaviour that many call ‘intelligent’.
However, this chapter will not attempt to provide answers to questions such as, “Do LLM models have behaviour that we can define as intelligent?”, “What is the true nature of human intelligence?”, “How can we define creativity?”. Although interesting, in order for these questions to be answered correctly, they would require much more in-depth investigation.
Instead, we will try to offer an overview that is accessible to non-experts in order to foster understanding of the mechanisms underlying the functioning of large-scale language models. It is only through increased awareness of these mechanisms that it is possible to understand their potential as well as risks, and to promote their correct use, especially in education.
A widespread misconception that needs to be dispelled is that such systems are basically large databases consisting of question-answer pairs. This falsehood derives from the common practices, established over the years, for the construction of chatbot systems (we invite you to read the relevant chapter). At the same time, this idea does not do justice to the generative character of LLM.
Language models are statistical models capable of assigning a probability of occurrence to a portion of text (usually a word), as a function of a given context, which is usually defined by the set of words preceding the expected word.
Models built using a purely statistical approach (eg, Markov chains, also called n-gram models) have been joined over time by language models built from neural networks1. These have evolved concerning both the structure of the networks and the size of those networks.
Large language models (LLMs) are named thus because they are based on large neural networks trained on huge amounts of data.
As a result, we start our investigation with the claim that language models generate texts rather than simply retrieving them from a pre-constituted knowledge base.
The generative aspect and its essentially expert-intuitive nature make it difficult to predict how an LLM system might respond to user input. This characteristic reflects a common distrust of such systems in relation to their potential ability to generate false or inaccurate text.
Thus, this feature is both a great technological achievement in terms of a machine’s ability to understand and produce text and, at the same time, one of the main dangers of such technologies.
Let us, however, try to discover such systems.
Like any technological revolution, the factors behind this breakthrough are many. In an exercise in simplification, we mention the main ones while offering the reader references that can guide him or her in a subsequent in-depth study:
- The size of the network: This is measured by the number of trainable parameters within the network. Large language models are deep neural networks, characterised by a staggering number of nodes and layers. To give an order of magnitude, some experts in the field call language models ‘large’ when they are characterised by more than 10 billion parameters. To give you a concrete order of magnitude, the GPT3 model has 150 billion parameters, while the largest version of LLAMa v2 has around 70 billion.
- The network architecture: Successes are guaranteed by the size of the network and also by how the nodes and different layers of the neural network are interconnected. Here again, with a simplification, we can identify the transformer networks and the attention mechanisms as the main architectural innovations that help to understand the improved effectiveness.
- The amount of data available for training: The substantial availability of data is undoubtedly an essential element in the training of such models, but in reality this has been established for many years and long predates the introduction of such models. The key innovation factor therefore lies in the training techniques and the selection and preparation process leading from the data to the training set. This is called self-supervised learning.
- The current computing power: Clearly, increased computing power has played a decisive role in enabling the scale of these networks. Empirical experience seems to show that the scaling factor is precisely one of the essential parameters for these behaviours to emerge.
- The tuning mechanisms: Another element, often ignored, is the tuning mechanisms that represent the last step in the process of building such models. In particular, we refer to the mechanisms of reinforcement learning with human feedback and ranking. These contribute to the definition of the model and are used to produce responses more in line with the user’s intention. To these we might add the fine-tuning processes that allow the specialisation and improvement of the behaviour of such networks in the execution of specific tasks.
- A security pipeline: Alongside the deep-learning model, there are ad-hoc techniques designed to mitigate system fragilities on unsafe inputs and to prevent unwanted behaviour on both safe and unsafe inputs.
At this point, aware of the different factors that characterise LLM, we have only to explore the potential of such systems by putting them to the test in our educational context. So, try talking to ChatGPT or Bard to help create new exercises and adapt them to the specific needs of our students, create new lesson plans with related content, and much more. It depends on your creativity and how you learn to dialogue with such systems.
Note: Each of these factors would require due elaboration. For those interested, we provide a list of references.
1 Bengio, Y., Ducharme, R., & Vincent, P., A neural probabilistic language model. Advances in neural information processing systems, 13, 2000.
2 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I., Attention is all you need, Advances in neural information processing systems, 30, 2017.