Error-Correction Mechanisms in Language Learning: Modeling Individuals Part 1
Nov 09, 2023
Abstract:
Since its first adoption as a computational model for language learning, evidence has accumulated that Rescorla–Wagner error-correction learning (Rescorla & Wagner, 1972) captures several aspects of language processing. Whereas previous studies have provided general support for the Rescorla–Wagner rule by using it to explain the behavior of participants across a range of tasks, we focus on testing predictions generated by the model in a controlled natural language learning task and model the data at the level of the individual learner.
There is a close relationship between language learning and memory. When learning a new language, we not only need to master its grammar and vocabulary but also need to memorize many words and expressions. Therefore, memory plays an important role in the language learning process.
Language learning can strengthen memory. Many studies have shown that learning a new language can stimulate neural activity in the brain, promote metabolism in the cerebral cortex, and thereby strengthen memory. Learning a new language can also stimulate the brain's neural network, making it healthier and improving cognitive abilities, intelligence, and logical thinking.
Learning a language requires repetition and practice. Frequently connecting speaking, listening, and reading can deepen the impression and make things more memorable. At the same time, by communicating with others, we can better memorize new words and phrases, learn about different language environments and cultures, and further deepen our understanding of the language.
Learning a language also allows us to change the way we think, enhances thinking flexibility and creativity, and further improves memory. At the same time, we can also conduct language learning and memory training by listening to music, watching movies, reading Chinese and foreign literature, etc., learning in different ways, and playing while learning to make the learning process more interesting, increase learning interest, and improve memory.
Therefore, when learning a language, we can use a variety of methods to practice, focus on memory activities and exercises, constantly innovate learning methods and methods, and strengthen memory training to improve learning effects and improve memory. It can be seen that we need to improve our memory. Cistanche deserticola can significantly improve memory because Cistanche deserticola is a traditional Chinese medicinal material with many unique effects, one of which is to improve memory. The efficacy of minced meat comes from the many active ingredients it contains, including acid, polysaccharides, flavonoids, etc. These ingredients can promote brain health in a variety of ways.

Click know 10 ways to improve memory
By adjusting the parameters of the model to fit the trial-by-trial behavioral choices of participants, rather than fitting a one-for-all model using a single set of default parameters, we show that the model accurately captures participants’ choices, time latencies, and levels of response agreement. We also show that gender and working memory capacity affect the extent to which the Rescorla–Wagner model captures language learning.
Keywords
language learning; error-correction learning; Rescorla–Wagner model; morphology; agreement.
Introduction
We humans share with other species many core learning mechanisms that allow us to adapt to our environment (Rescorla, 1988). These mechanisms include, among others, classical conditioning (i.e., Pavlovian conditioning; Pavlov, 1927), instrumental conditioning (also operant conditioning; Skinner, 1938), and forms of social learning, such as vicarious learning (Bandura, 1962).
The most uniquely defining human learning ability is language learning, which also includes efficient transgenerational transmission and is foundational for social inclusion and cohesion. However, whereas core learning mechanisms are relatively well understood, language learning remains much of a mystery (Ambridge & Lieven, 2011). An early attempt by Skinner (1957) to account for language learning using the same principles as those governing lower-level cognitive tasks was quashed by Chomsky (1959).
For much of the remainder of the 20th century, language was seen as a by-and-large innate system, governed by rules and handled by a uniquely human and specialized cognitive structure. This structure was initially conceptualized as a language acquisition device and later extended to become universal grammar.
This dominant view was challenged from two sides simultaneously. The emergence of usage-based linguistics in the 1980s (Langacker, 1987) promoted a view of language as a dynamic and probabilistic system, resulting from general cognitive capacities acting on language input (D ˛abrowska & Divjak, 2015). This view meshed well with connectionist frameworks, which showed that rulelike behavior can emerge from exposure to usage alone and that language knowledge is sensitive to the properties of the input (Plaut & Gonnerman, 2000; Seidenberg & McClelland, 1989). '
Connectionism, arguably, paved the way for changes in theorizing too, toward a view of language as being learned like any other skill, and the early 2000s witnessed the start of reintegration of the basic principles of learning into the body of work on language (e.g., see Bybee & McClelland, 2005; for more up-to-date works, see Ellis et al., 2016, which addresses both first and second language learning, as well as Chuang et al., 2021, which addresses lexical acquisition in second and third languages). Language was now seen as being amenable to the same general-purpose cognitive capacities and learning mechanisms that humans and animals use to navigate and adapt to their environment (cf. Ellis, 2006a; Ellis & Sagarra, 2010, 2011; Sturdy & Nicoladis, 2017).

Among these learning models, Rescorla and Wagner’s (1972) model of classical conditioning stands out for its simplicity and its ability to explain a range of empirical learning phenomena (Siegel & Allan, 1996). This model is biologically plausible (Chen et al., 2008) and has an evolutionary advantage over other more powerful learning mechanisms, in the sense that it has a higher likelihood of being naturally selected and persisting in the evolutionary process, compared to other plausible learning mechanisms (for more details, see Trimmer et al., 2012).
Background Literature
The Rescorla–Wagner Model
As a model of classical conditioning, the Rescorla–Wagner (R–W) model is concerned with situations where an entity (a human, an animal, or a machine) has to learn the predictive relationship between objects and/or events (i.e., cues and outcomes) in an environment, and where cues compete for their predictive value for an outcome while iteratively (re)calibrating the learning (or association) weights. More specifically, an association weight reflects the tendency of an outcome to occur in the presence of a certain cue. A higher positive association weight value for a particular outcome corresponds to a higher likelihood of occurrence of that outcome in the presence of the cue; conversely, a highly negative value corresponds to a greater likelihood of nonoccurrence of that outcome (the cue is said to be inhibitory in this case). Values close to zero mean low chances of observing (if the weight is positive) or inhibiting (if the weight is negative) the outcome.
The R–W model assumes that the organism computes a simple error-correcting learning rule used to update the association weights in each new learning event (e.g., each trial in a behavioral experiment). The general idea behind the correction rule is that the association between a cue and outcome is (a) strengthened if both cue and outcome are present in the learning event, (b) weakened if the cue is present but the outcome is not, and (c) kept the same if the cue itself is absent.
The updating of the association weights is driven by the discrepancy between the expected and the obtained outcome, such that the magnitude of the update—how much the association weights are adjusted—is determined by two parameters called learning rates, and the direction of the update—whether it increases the weight or decreases it—depends on the sign of the difference between the expected and the observed outcome. In this way, most broadly, for the R–W model, learning is about the outcomes, and this sets it apart from related models where learning is about the input cues (e.g., Pearce & Hall, 1980).
Another feature of the R–W model is that, although the outcomes are updated independently from each other, input cues compete for the predictivity of outcomes.
In other words, the adjustment of the weights depends not only on the single cue being updated but on all the cues present in the learning event through their sum of association weights. This cue competition principle allowed the R–W model to explain many of the puzzling phenomena of classical conditioning, some of which were also valuable for understanding the mechanics of language learning (see the next section for a discussion).1 One of the best-known examples of such learning phenomena is the blocking effect (Kamin, 1969). This effect occurs when a cue is trained in a compound with a second cue to predict an outcome but when the second cue is already a good predictor of the outcome. In such cases, the first cue cannot form a strong association with the outcome (i.e., the first cue is blocked by the second cue). More generally, the cue competition principle often results in the observation that the best cues for the outcome prevent other cues from developing a strong association with that same outcome.
The Rescorla–Wagner Model and Language Learning
Since its first mention within a linguistic context by Ellis (2006a), evidence has accumulated showing that the R–W model can capture several aspects of language learning (e.g., Baayen et al., 2011; Ellis, 2006b; Milin, Divjak, & Baayen, 2017; Milin, Feldman, et al., 2017). So far, the available empirical evidence stems from studies that train an R–W model that uses default parameter values (here we allude to the two learning rate parameters used to update the association weights after each new event), typically on either a small sample from experiments on artificial languages or a large corpus of texts.2 Posttraining learning measures are then extracted from the simulated model and are compared against observed response measurements from an experimental task.
The first issue is that predictions for (and from) such models are typically generated independently from the experiment (with exceptions such as the studies of Ramscar & Yarlett, 2007, and Divjak et al., 2021, where the model generated the hypotheses to be tested experimentally).
The parameters are typically set to their default values, missing the opportunity to take into account the variability that can arise from simulating the model with different parameter values (though see Olejarczuk et al., 2018, who used fixed parameter values but fitted a separate model to each participant’s data using the same sequence of examples encountered by the participant). Incorporating the variability arising from the model parameters when fitting learning models to language data has the potential to improve the explainability of the individual differences observed in the experiment, especially since language usage and representation is an area that shows huge individual variation (D ˛abrowska, 2018).

Training the model on a large-scale corpus comes at an even greater cost. We leave aside here the issue of (lack of) similarity between the contents of a corpus and the input that language users receive (which plagues converging evidence studies generally; for a summary discussion, see Klavan & Divjak, 2016, and for collections of worked examples, see Divjak & Gries, 2012, and Gries & Divjak, 2012). Here we focus on another issue: Training on a corpus mutes the two main sources of variability of the model—namely, those related to the choice of model parameters and the order of training examples—which are mostly active during the early stages of learning (Shanks, 1995; also see Milin et al., 2020, for a more general discussion of the trial order effect in error-correction learning).3 These early biases, as Ellis (2006a) called them, constitute a real test for the R–W model before it can be deployed as a model of language learning at a large scale. Modeling the parameters’ variability and training the R–W model on the same examples encountered by the participants represent novel opportunities for understanding language learning not yet fully explored in previous studies.
The Present Study
The present study aims to model how individual language learners engage with the task at hand on a trial-by-trial basis, which constitutes a stepchanging challenge for the application to language learning of discrimination or error-correction learning in general and the R–W model in particular. Whereas previous studies have provided general support for the R–W rule by using this model to explain the behavior of participants across a range of tasks (Divjak, 2019; Milin & Blevins, 2020; Milin, Feldman, et al., 2017; Pirrelli et al., 2020), we focus on testing predictions generated by the model in a controlled natural language learning task and model the data at the level of the individual language learner. In doing so, we treat each participant as a separate learning entity governed by different capacities, which are, crucially, formalized through the learning parameters of the chosen model.
Given that several studies have reported that classical conditioning performance can be affected by cognitive and personal characteristics such as working memory (Baetu et al., 2018; Sasaki, 2009), gender (Lonsdorf et al., 2015; Merz et al., 2018), and age (e.g., Mutter et al., 2012), we also investigate whether such characteristics could affect the adoption of an R–Wlike mechanism of language learning.
To achieve these goals and to address these questions, we designed a simplified natural language learning task: simplified to exploit the advantage of tight empirical control, but only partly so to maintain a commitment to ecological validity by offering a more naturalistic language input experience. The task represents, to a reasonable extent, how people would learn Polish subject–verb agreement mappings through natural exposure to examples.
We trained native English speakers on a set of carefully crafted examples, which had both auditory and visual dimensions, and which incorporated some of the complexities inherent to subject–verb agreement in Polish. Next, for each participant, individually, we selected the best-fitting model (i.e., the parameters that led to the closest match between the participant's responses and the model), using the same training examples encountered by the participant. We then assessed the R–W model for its capacity to recover participants’ language choices as well as their time latencies, and compared it to other plausible, yet rule-based response strategies. Finally, we tested whether cognitive and personal characteristics such as working memory capacity, age, and gender affect the extent to which the R–W model captures language learning.
Method
Participants
Sixty-six participants (Mdnage = 20 years; range = 18–65; 41 females) took part in the experiment in exchange for a £7 Amazon voucher. Participants were university students and staff. All of them were native English speakers without knowledge of Polish or any other Slavic languages, had normal or corrected-to-normal hearing and vision, and did not declare any learning disabilities. Participants had different educational backgrounds, and many of them could speak other languages in addition to English (the distributions of education and language backgrounds are presented in Appendix S1 in the Supporting Information online).
Materials and Procedure
All our materials, including data and code, are openly available on Github (https://github.com/ooominds/Error-correction-mechanisms-in-languagelearning) and the University of Birmingham’s open-access repository, UBIRA (https://doi.org/10.25500/edata.bham.00000911). Participants completed three tasks and a short questionnaire in the following order: (a) a language learning task (main task), (b) an explicit knowledge and demographic questionnaire, (c) an implicit learning task, and (d) a working memory (WM) task. (A detailed description of each task is provided in the next section.)

The language learning and implicit learning tasks were implemented and presented to participants using OpenSesame (Mathôt et al., 2012; Mathôt & March 2022). The demographic questionnaire was presented using Google Forms, and the WM task was administered using Tatool (von Bastian et al., 2013). The experiment was run either individually or, whenever possible, in pairs, in a quiet room, on Intel Core i7-8700 computers running Windows 10 and equipped with Iiyama G-Master 24.5-in. monitors running at 59 Hz with a screen resolution of 1,920 × 1,080 pixels. Participants heard the auditory stimuli via Bose QuietComfort 35 II noise-canceling headphones and registered their responses using a keyboard. The experiment took about 50 min to complete.
For more information:1950477648nn@gmail.com






