Error-Correction Mechanisms in Language Learning: Modeling Individuals Part 3
Nov 09, 2023
Relationship Between the Model’s Activation-Based Measures and Participants’ Choices and Response Times
In a further assessment of the quality of fit of the R–W model, we carried out a generalized linear mixed-effects modeling analysis looking at the relationship between participants’ choices and the activation-based measure derived from the fitted R–W models—that is, activation support for the np form (see the section on computational modeling for more details). We also analyzed the relationship between participants’ response times and activation support by fitting a polynomial linear mixed-effects model with both the linear and quadratic terms of activation support, since we expected activation support to have a quadratic effect on response times (Table 3). More detailed summaries of the models with the random effects structures are provided in Appendix S8 in the Supporting Information online.
Random effect refers to the impact of random changes among individuals. It is widely used in sociology, psychology, and other fields. Memory is an important indicator for people to measure brain function, and it directly affects people's learning and life. But what exactly is the relationship between random effects and memory?
Research shows that there is a certain correlation between random effects and memory. When performing memory tasks, interference from random factors will reduce an individual's completion efficiency and accuracy, thus affecting memory performance. Correspondingly, training an individual's anti-interference ability can improve their ability to respond to random effects, thereby improving memory.
In addition, different groups of people respond differently to random factors. Some people have strong adaptability and show better memory ability when encountering a highly random environment; while for some people, random factors will cause excessive interference and reduce their memory. level. This is related to the individual's basic memory ability, psychological quality, life experience, and other factors.
Therefore, we need to recognize the impact of random effects on memory and use scientific methods to improve individual anti-interference ability and adaptability to improve memory levels. I believe that as long as we go all out, rely on science and technology to continuously explore, and constantly improve our self-quality, we will be able to keep moving forward on the road to pursuing an excellent life and mastering more wonderful lives! It can be seen that we need to improve memory, and Cistanche deserticola can significantly improve memory because Cistanche deserticola is a traditional Chinese medicinal material that has many unique effects, one of which is to improve memory. The efficacy of minced meat comes from the various active ingredients it contains, including acid, polysaccharides, flavonoids, etc. These ingredients can promote brain health in various ways.

Click know ways to improve brain function
As expected, activation support for the nonmasculine plural form was significantly positively associated with the likelihood of nonmasculine form choices, OR = 6.78, p < .001, 95% CI [3.82, 12.03]. Figure 4 (left pane) also shows that this relationship is asymmetrical around 0, reflecting a strong bias towards the masculine verb form that, even with (activation-based) evidence supporting the nonmasculine form, can still lead to a preference for the masculine form. Also, and in line with our hypotheses, the second-order polynomial term of activation support was a significant predictor of response time, as there was a quadratic relationship between the activation support and response time, with the slowest responses recorded for the least supported events, b = −0.20, p = .012, 95% CI [−0.35, −0.04]; see also Figure 4, right pane.

These results suggest that the fitted models performed well in predicting participants’ form choices and that the information encoded in the association weights—the basic currency of an R–W model—is a good predictor of both the likelihood of choosing a particular verb form and the speed with which the response is made. Participants’ level of agreement regarding the choice of a certain verb form thus differed depending on the activation support of that particular form, with a high level of agreement expected and attested for high positive or low negative activation support values and with a high level of disagreement expected and attested for activation support around zero.
Level of Agreement Between Participants Through the Lens of the Model
We further analyzed participants’ behavior by exploring two additional questions. First, what level of agreement was there among language learners, given a particular type of event (e.g., events made up of cues from the same grammatical gender versus events intermixing cues of different grammatical genders)? Second, and crucially, can the differences in levels of agreement be adequately explained using the R–W model?
To answer these questions, we analyzed the effect of the presence of each event category on the proportion of participants who chose one of the two verb forms (for a full list of event categories used in the task, see Appendix S3 in the Supporting Information online). To obtain the model estimates of the proportions, we used the generalized linear mixed-effects model that we built in the previous analysis (Table 3, left) to model the relationship between activation support and the choice of the nonmasculine plural form. Specifically, for each event category, we averaged the model’s predicted proportions of the nonmasculine forms based on its activation support values across participants (given that each event category was encountered once by each participant). The results are summarized in Figure 5. The events in the figure are sorted in ascending order by observed choice proportions in the experiment, so the leftmost and rightmost sides represent regions of a high level of agreement between participants, whereas events situated in the middle part triggered a high level of disagreement between participants.
Participants had a clear preference for the masculine plural form when the event contained a masculine personal cue (uMP) or when the event included only animate masculine cues (i.e., “uMA1 + uMA2”). Likewise, participants preferred the nonmasculine plural form when the events solely contained feminine cues, whether personal or animate (e.g., “uFA + uFP”). High levels of disagreement were mainly observed for events intermixing the inhibitory blocked cue and a feminine cue or a masculine animate and a feminine personal cue (with proportions ranging between .44 and .51).

Comparing the observed and predicted proportions, the model managed to capture the difference in the levels of agreement between participants across the different categories of events surprisingly well. The largest discrepancies between the observed and predicted proportions appear to have occurred for events involving the blocked and inhibitory blocked cues, suggesting that it was more challenging for the model to capture their effect on participants’ choices.

Relationship Between Model-Fit Quality and Individual Difference Measures The model fit results demonstrated that the R–W model successfully accounts for the behavior of a large proportion of our participants, including their response times. However, the quality of fit varied across participants, with data from a few participants fitted very poorly by the model. In an attempt to assess whether the other measures collected during the experiment (demographic characteristics, WM span, and implicit learning) can explain the observed individual differences in the model fit quality, we ran a multiple linear regression to predict participant–model match rates (logit-transformed) based on the different individual difference measures collected. Specifically, the predictors included WM span (z-transformed), gender, and age; the time-slopes extracted for each participant from the implicit learning task did not make a significant contribution and are reported on in Appendix S5 in the Supporting Information online.

From the full linear regression model that included all four individual measure variables as well as the interaction between gender and WM span, we derived a final model containing only the significant variables by using backward variable selection based on the likelihood ratio test. Data from two participants were omitted from this analysis because one did not report their gender and one was identified as an extreme outlier, as explained in the section on the WM task. In total, data from 61 participants were fed to the linear regression model. The best model after variable selection included both WM span and gender, but not their interaction (Table 4).
The model fit accuracy increased significantly with increasing WM span, b = 0.17, p = .008, 95% CI [0.05, 0.30], as illustrated in Figure 6 (left pane), and was significantly higher for female participants (M = .69, SD = .17) in comparison with male participants (M = .63, SD = .16), b = −0.28, p = .033, 95% CI [−0.54, −0.02], as shown in Figure 6 (right pane). To check that removing influential residuals did not affect our findings, we applied model criticism as described in Baayen and Milin (2010); excluding the single extreme residual with a z score greater than 2.5 in our model resulted in even stronger effects (p = .004 for WM span and p = .007 for gender). Also, WM span did not significantly correlate with gender (p = .058), which suggests that their relationship is unlikely to have influenced the effects of gender and WM span on model fit accuracy (see Appendix S9 in the Supporting Information online for more detail). Our findings thus show that having a larger WM capacity or being female increased the likelihood of a participant choosing verb forms by the R–W model in our language learning task.

Discussion
Summary of Findings
Our findings show that an R–W mechanism captures well how participants learn subject–verb agreement in a morphologically complex language and, by extension, how they might learn language through mere exposure to it. With an average fit accuracy of 68%, based on a simple activation-based decision strategy, the model explained the verb form choices made by a large proportion of participants rather well.7 More interestingly, an activation-based measure extracted from the best-fitting models correlated strongly with both the likelihood of a particular verb form choice and the time required to make that choice.
The model also provided insights as to why participants might display high or low agreement levels when choosing a verb form, depending on the nature of the subject of the clause. According to the model, this is due to the association strengths that the participants acquire, which are used to calculate the activation support for each of the possible verb forms. These association strengths are mostly affected by (a) the learner’s learning rate for the cues (the learning rate determines the magnitude of the correction of the weights, based on the estimated error in each trial) and (b) the distribution of the learning events they encountered during the learning stage (this would include the frequency of each learning event and the order of the learning events, among other things). Thus, one prediction from our study is that changing the order or the relative frequencies of the learning events during the training might lead to different choice patterns from those we observed here.
We also found a significant relationship between both gender and WM capacity and the participant–model match rates, which sheds light on what might have driven the observed differences in the quality of model fit. The fact that in our experiment a larger proportion of women than men acted by an R–W mechanism is in line with findings from several previous studies that highlighted the association between gender and classical conditioning for both humans (Lonsdorf et al., 2015; Merz et al., 2018) and animals (e.g., Velasco et al., 2019). This suggests a significant difference in learning between men and women, with women being better modeled by the R–W error-correction learning rule. Women are generally known to have a small language advantage over men (see Kimura, 1999, for an extensive assessment), specifically in areas about lexical retrieval (Balling & Baayen, 2008, 2012). It has been suggested that this might be due to women having a superior declarative memory, which they could use to generalize over stored neighboring forms (Hartshorne & Ullman, 2006).
The finding that the likelihood of a language learner behaving according to the R–W mechanism increases with WM capacity provides evidence that WM can play a role in classical conditioning by affecting the adoption of a classical conditioning mechanism such as the R–W rule. Sasaki (2009) and Baetu et al. (2018) previously provided evidence of disruption of classical conditioning performance when WM is loaded using dual-task paradigms. The present finding adds to the mounting evidence that, against the predominant belief, WM may be implicated in low-level cognitive processes such as instrumental learning, more commonly referred to as reinforcement learning within the neuroscience and machine learning communities (Collins & Frank, 2012; Ez-zizi, 2016) and in some forms of implicit learning (Medimorec et al., 2021).
Blocking and inhibitory blocking-like effects did not emerge from the R–W model for all participants. As shown, this was mainly due to the short duration of the training phase. Increasing the number of training trials not only resulted in the reemergence of blocking effects for all participants’ best-fitting parameters but also removed the variability in the association weights, thus predicting that all participants should end up behaving in the same way in the long run (see Appendix S7 in the Supporting Information online). This is not surprising in light of Danks’ (2003) work, which shows that, in many cases, the R–W system will converge to the same equilibrium regardless of the parameters used. In other words, the destination of learners is often the same (this is also desirable since we often want all learners to learn to make the same associations), but the paths to those destinations can differ substantially depending on the nature of the learning problem (e.g., amount of data available, relative frequencies of the events, number of cues and outcomes). This has many implications for language learning studies employing the R–W model, as discussed in the next section.

Implications for Language Learning Studies That Use the Rescorla–Wagner Model
Several language research groups have embraced the R–W model as a valuable approach for modeling language learning phenomena due to its simplicity, its cognitive plausibility, and its successes in explaining a wide range of language learning phenomena. The present study provides further support for simulating or modeling language learning using this model, but also draws attention to the important issue of individual differences, which has so far been overlooked in studies that combine computational modeling using the R–W model and experimental data.
Given the amount of individual difference we observed in our data, it would be prudent to move away from the currently predominant approach where the R–W model is run once with its default parameter values and used to explain data from all participants. Although the effect of the learning rate parameter on model fit accuracy was not substantial for the chosen task, in practice, R–W performance will always be affected by the choice of learning rate, irrespective of the particular modeling challenge (Milin, Divjak, & Baayen, 2017, pp. 1739–1741).
Another common practice that might need to be reconsidered is training the model on one large dataset or a small subset of it. Consequently, the features that set the model apart from purely statistical classification models—namely, the possibility of choosing parameters that capture how fast an individual can learn, and the ability to account for input order effects—remain unused. It is not surprising, then, that several studies have reported only minor or no differences between the R–W model and other statistical or learning models such as logistic regression, memory-based learning, and decision trees (Baayen, 2011; Baayen et al., 2013). With such an approach, the main advantage of the R– W model over purely statistical techniques is its ability to perform incremental learning, as data become available. However, the same advantage could be achieved from any neural network model with no hidden layer as such a model can also produce weights between cues and outcomes, albeit with a different and, arguably, less plausible learning rule (here we mainly allude to the backpropagation learning rule, which is currently the predominant approach when training neural network models). With such a model, the same model fitting approach we used here can still be applied but with a different set of parameters to tune, such as the type of activation function, number of neurons, and learning rate.
Recent work in usage-based frameworks has highlighted the vast individual differences characterizing language knowledge in first-language populations. Individual differences in grammar are comparable in size to those in lexical knowledge and are related to both the quality of the input and the learner’s cognitive abilities (D ˛abrowska, 2018). Our findings demonstrate that individual model fitting should be the default option when comparing the R–W model or other computational models to participants’ data. Specifically, model parameters and data inputs should be adjusted separately for each participant to allow for a better account of individual responses and obtain a more veridical picture of where knowledge of, for example, a rule resides, whether in the aggregated mind of the linguist or the individual minds of the users (Divjak, 2018), and how it is distributed across the population. Our data also support the usage-based stance that our linguistic knowledge is shaped by our personal and cognitive characteristics, as attested by the significant role of WM and gender in the quality of model fit; such factors should be considered by default when modeling language.
Limitations and Future Directions
Our study is the first to fit the R–W model to the behavior of individual learners in an actual language learning task. We used the R–W model in the form available now, but our findings, despite being very promising, show that it might be interesting to extend the model to handle WM capacity limitations. Further investigations in such a direction could be inspired by work done in reinforcement learning—a closely related field to classical conditioning—where learning has also long been assumed to occur in areas, often associated with low-level cognitive functions, such as the dorsal and ventral striatum (Balleine & O’Doherty, 2010), but is now recognized to also involve high-level cognitive control via WM. This has led to the development of new learning frameworks where WM is explicitly modeled as a key component that supports learning by retaining information from previous trials (e.g., see Collins & Frank, 2012; Ezzizi et al., 2015). This could be the approach to take for R–W and other classical conditioning models, especially because in large simulation-based language studies, learning events typically contain a large number of cues (e.g., all trigraphs or words in one sentence), which cannot be processed at once by a human learner—as is required in the updates of the R–W model—due to known WM capacity limitations (see Glautier, 2013, and Baayen et al., 2016, for early attempts in this direction).
Another direction for future extension of our work is to collect participants’ responses over time while they are trained on the cue–outcome associations rather than having a separate post-learning test phase. This would have the potential to improve the model fit further and to provide a broader picture of the behavior of participants while they are learning the task. In addition, this could allow the extraction of a learning measure based on time slope for the language learning task, such as we did for the implicit learning task, and thus would increase the likelihood of finding a link between implicit learning and the quality of fit of the R–W model (see also our discussion in Appendix S5 in the Supporting Information online). A link between the two measures can also be probed by fitting the R–W model to the response times collected in the implicit learning task, as was done by Notaro et al. (2018), rather than using time slopes only or a mixture of the two.
Finally, the particular structure of our language learning task favored the normative (masculine-biased) strategy, but an interesting question that remains unanswered is whether we can use the R–W model to predict the emergence of different strategies as we vary the structure of the language input and control for individual differences among language learners. The approach of using the R–W model to explain or predict the level of agreement among language users can be extended beyond Polish subject–verb agreement in the plural past tense to cover other facets of language where a lack of consensus in language use has been observed (e.g., see Geeraert et al., 2020; Milin, Divjak, & Baayen, 2017).
Conclusion
The R–W model is a very simple learning model, yet it has multiple sources of variation that can be used to explain participants’ behavior in language learning experiments. These include the model’s learning rate, the order of presentation of learning examples, and the relative frequencies of cue–outcome cooccurrences. In the present study, we systematically incorporated these sources of variation when fitting the model to participants’ data, thus enabling the model to successfully capture the choices and response latencies of most participants in a language learning task on Polish subject–verb agreement. In addition, cognitive and demographic characteristics such as WM and gender determined the extent to which language learning was driven by R–W-like learning principles.
Open Research Badges
This article has earned Open Data and Open Materials badges for making publicly available the digitally shareable data and the components of the research methods needed to reproduce the reported procedure and results. All data and materials that the authors have used and have the right to share are available at https://github.com/ooominds/Error-correction-mechanismsin-language-learning and https://doi.org/10.25500/edata.bham.00000911. All proprietary materials have been precisely identified in the manuscript.
Notes
1 The idea of cue competition is also at the core of the competition model of Bates and MacWhinney (1987) for language acquisition. Their model, however, uses mainly symbolic/linguistic cues such as word order or morphological features of words and is based on a connectionist approach requiring a much more complex architecture than the R–W model.
2 The contents of any corpus are, at best, a very rough approximation of the input that language users receive. Conversely, artificial languages are illustrative and informative for understanding natural languages but hardly a realistic reflection of the complexity found in any given natural language.
3 The early implementations of the R–W rule as the naïve discrimination learning model relied on a noniterative version of the algorithm, as provided by Danks (2003), which eliminates the possibility of any order effects emerging.
4 It is important to note that here we were not interested in testing the blocking effects per se as is typically done in behavioral experiments of classical conditioning. In those experiments, only the events relevant to blocking are included (blocking is tested separately from the other effects), and blocking is tested on a second cue rather than a third cue as in our case (e.g., Kamin, 1969). Also, the learner is usually trained for long enough to ensure that the “blocking” cue becomes a good predictor of the outcome of interest. Such a clean experimental setup would not fairly represent the “disarray” so pervasive in natural languages. As our study is about language learning, we opted to mimic a realistic learning situation as closely as possible.
5 An additional participant experienced equipment malfunction in the middle of the test phase and had to retake the test. We retained this participant’s data since they did not go through any extra training in the second run of the test phase and thus started the new run with the same knowledge as in the first run (recall that the outcome feedback is only provided in the training phase). The familiarization effect should not play a role here since all participants underwent a few practice trials in all phases before starting the actual experiment. We also reran all analyses without this participant’s data and confirmed that removing them did not alter any of the results presented in the paper.
6 The quality of model fit on unseen data was evaluated using leave-one-out cross-validation. Specifically, for each participant, we held out one event of those they encountered in the test phase and fitted a R–W model to the remaining events. The model was then evaluated only on the reserved event by assessing whether the response from the participant matched that of its best-fitting R–W model. We repeated this for all test events and then computed the average (leave-one-out) fit accuracy, that is, the proportion of matches between the responses from a given participant on each of the reserved events and their associated predictions from the participant’s best-fitting R–W model.

7 In fact, this accuracy level can be considered excellent since we assumed a very simple nonprobabilistic action selection process where a verb form is chosen if it has the highest activation. This does not take into account the variability that might arise from exploration, lapse of attention, or inherent brain noise.
References
1.Adani, S., & Cepanec, M. (2019). Sex differences in early communication development: Behavioral and neurobiological indicators of more vulnerable communication system development in boys. Croatian Medical Journal, 60(2), 141–149. https://doi.org/10.3325/cmj.2019.60.141
2.Ambridge, B., & Lieven, E. V. M. (2011). Child language acquisition: Contrasting theoretical approaches. Cambridge University Press.
3.Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Revista Brasileira de Linguística Aplicada, 11(2), 295–328.https://doi.org/10.1590/S1984-63982011000200003
4.Baayen, R. H., Endresen, A., Janda, L. A., Makarova, A., & Nesset, T. (2013). Making choices in Russian: Pros and cons of statistical methods for rival forms. Russian Linguistics, 37(3), 253–291. https://doi.org/10.1007/s11185-013-9118-6
5.Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12–28. https://doi.org/10.21500/20112084.807
6. Baayen, R. H., Milin, P., Filipovic Ður ´ đevic, D., Hendrix, P., & Marelli, M. (2011). An ´
amorphous model for morphological processing in visual comprehension based on
naive discriminative learning. Psychological Review, 118(3), 438–481. https://doi.
org/10.1037/a0023851
7.Baayen, R. H., Shaoul, C., Willits, J., & Ramscar, M. (2016). Comprehension without
segmentation: A proof of concept with naive discriminative learning. Language,
Cognition and Neuroscience, 31(1), 106–128. https://doi.org/10.1080/23273798.
2015.1065336
8. Baetu, I., Burns, N., & Child, B. (2018). Individual differences in working memory capacity predict performance on an associative learning task [Paper presentation]. Australian Psychologist, Sydney, Australia. https://doi.org/10.1111/ap.12372
9. Balleine, B. W., & O’Doherty, J. P. (2010). Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology, 35(1), 48–69. https://doi.org/10.1038/npp.2009.131
10.Balling, L. W., & Baayen, R. H. (2008). Morphological effects in auditory word recognition: Evidence from Danish. Language and Cognitive Processes, 23(7–8), 1159–1190. https://doi.org/10.1080/01690960802201010
For more information:1950477648nn@gmail.com






