A Resource-rational Model Of Human Processing Of Recursive Linguistic Structure Part 3

Jan 23, 2024

Experiment 2: Effect of Semantic Cues

We next replicated experiment 1 on a second set of items and simultaneously tested the predicted effect of semantic compatibility.

Semantic compatibility refers to people's understanding and mastery of the compatibility and interrelationships between different units in language, words, or symbols. Memory refers to people's ability to remember and store information.

There is a close relationship between semantic compatibility and memory. Good semantic compatibility can improve people's memory quality, while poor semantic compatibility will hinder people's information understanding and memory effects.

First, semantic compatibility can improve the relevance of information, thereby helping people better understand and remember information. If there is an obvious correlation between different units, people can use this correlation to build connections between information and form a network structure between information. This network structure can improve the memory effect of information and enhance people's information storage ability.

Secondly, good semantic compatibility can improve the understandability of information, making it easier for people to remember. If there is an obvious compatibility relationship between information, people can easily understand the connection between the information, thereby forming an understanding and memory of the information. On the contrary, if there is an obvious incompatibility between the information, people will feel confused and confused and have difficulty understanding and remembering the information.

Finally, poor semantic compatibility can adversely affect people's memory performance. If there is too much difference between different units, it will be difficult for people to understand and remember the information, thus losing the value of memory. Therefore, in the process of writing and disseminating information, semantic compatibility should be maintained as much as possible to improve people's understanding and memory.

In summary, there is a strong relationship between semantic compatibility and memory. Good semantic compatibility can improve the relevance and understandability of information, thereby enhancing people's information storage ability and memory effect. Therefore, in daily life and work, semantic compatibility should be maintained as much as possible to improve the understanding and memory of information. It can be seen that we need to improve memory, and Cistanche deserticola can significantly improve memory because Cistanche deserticola is a traditional Chinese medicinal material that has many unique effects, one of which is to improve memory. The efficacy of minced meat comes from the various active ingredients it contains, including acid, polysaccharides, flavonoids, etc. These ingredients can promote brain health in various ways.

help with memory

Click know supplements to improve memory

Beyond the two manipulations from experiment 1, in the TWO and THREE conditions, we additionally varied the second to-last verb phrase: In the COMPATIBLE condition, the first noun was a plausible subject (e.g., "annoyed the patient"); in the INCOMPATIBLE condition, it was not (e.g., "cured the patient"). In the COMPATIBLE condition, nonveridical versions such as "the report by... " should have a higher a priori probability, making a prediction of the last verb less accurate. We constructed 42 stimulus items.

Fig. 3B shows predictions from the resource-rational model and previous theories for these items. In addition to the effects from experiment 1, the model predicts higher difficulty in the COMPATIBLE condition, particularly within THREE. Neither surprisal theory nor DLT predicts any effect of compatibility.

We collected reading time data from 200 participants, including both COMPATIBLE and INCOMPATIBLE variants in the TWO and THREE conditions. In all other respects, the experiment and data analysis were identical to experiment 1. Reading times are shown in Fig. 3B. 

The results of experiment 1 were replicated: First, reading times were higher in THREE than in TWO (β = 0.29, 95% CrI [0.24, 0.35], P(β < 0) < 0.0001; effect in raw reading times: 337 ms, 95% CrI [267, 411] ms). 

Second, there was an interaction between embedding bias and the presence of a "that"-clause (β = −0.06, 95% CrI [−0.10, −0.024], P(β > 0) = 0.0007). As in experiment 1, the effect of embedding bias was positive in the ONE condition (difference between "fact" and "report": 193 ms, 95% CrI [37, 357] ms), and negative across the TWO and THREE conditions (difference between "fact" and "report": −105 ms, 95% CrI [−194, −18] ms). 

Third, in agreement with the model predictions, reading times were higher in the COMPATIBLE condition than the INCOMPATIBLE condition (β = 0.083, 95% CrI [0.031, 0.136], P(β < 0) = 0.0014; effect in raw reading times: 96 ms, 95% CrI [36, 156] ms). See SI Appendix, section S3 for further analyses. 

Note that the effects of embedding bias and compatibility are numerically larger in the THREE conditions than in the TWO conditions; a meta-analysis shows that these differences are statistically meaningful in both reading times and in parts of the model's parameter space (SI Appendix, sections S2.1 and S6.6). 

Numerical differences in the slope of embedding bias between COMPATIBLE and INCOMPATIBLE were not statistically meaningful (SI Appendix, Fig. S23), nor were numerical differences in the intercept of the model predictions between the two experiments (SI Appendix, Fig. S6).

See SI Appendix, section S6 for converging evidence from preceding reading time studies (total n = 501). We further replicated the effect of embedding bias on comprehension in two ratings studies (total n = 335; SI Appendix, section S5).

Experiment 3: Production Study

So far, we have confirmed the model predictions in reading times. Difficulty measured in reading times indicates that humans' expectations are violated, but does not directly indicate what human expectations are. 

To provide a second test of human expectations, we turned to a production paradigm-Cloze completion (40, 41)-that has been used in language research to evaluate what words are expected immediately following a preamble. We use this method to evaluate the complexity of multiple nested structures and to measure how many verbs humans expect following a complex preamble.*

We asked participants to complete contexts of the form "The report that the doctor who the diplomat... " to a complete sentence. We expected participants to either produce grammatical completions with three verbs, such as "...mistrusted cured the patient was surprising," or ungrammatical versions with fewer verbs, such as "...mistrusted was surprising." Resource-rational lossy-context surprisal predicts that the rate of such ungrammatical completions should be lower for nouns with high embedding bias (e.g., "fact"), as these make it easier to recover the true context from imperfect memory representations (Fig. 4A). Existing expectation-based and memory-based models do not predict that the rate of grammatical completions depends on embedding bias.

improve cognitive function

We recruited 80 participants. Fig. 4 shows the rate of incomplete completions (less than three verbs) as a function of embedding bias. As predicted, there was an effect of embedding bias on the rate of ungrammatical responses (β = −0.32, 95% CrI [−0.60, −0.05], P(β > 0) = 0.0123) in a trial-by-trial logistic mixed-effects analysis.

We replicated this study in two more languages (Spanish and German), including one (German) where the difficulty of center embeddings is substantially weaker than in English (42). 

ways to improve your memory

In Spanish, we targeted subject relative clauses (el hecho de que el director que, "the fact that the director who") to avoid the less natural subject–initial object relative clauses, simultaneously testing generalization to a different syntactic configuration. In German, we targeted embedded structures (e.g., Klaus hat erzahlt, ¨ dass die Behauptung, dass der Student, den der Professor, "Klaus said that the claim that the student who the professor"), as they are known to increase difficulty to levels closer to English (35).

We recruited 60 participants in each language. In both languages, the effect of the embedding rate was estimated to be negative, with estimated effect sizes comparable to the English result (Spanish: β = −0.23, 95% CrI [−0.34, −0.12], P(β > 0) < 0.0003; German: β = −0.28, 95% CrI [−0.56, −0.03], P(β > 0) = 0.01738). These results suggest that the-previously undocumented-effect of embedding bias on human expectations holds across different languages, even when they vary in the overall difficulty of center embeddings.

Discussion

We have introduced a model of human language processing as resource-rational prediction, scaled to arbitrary input using contemporary machine learning methods. Aiming to reconcile memory- and expectation-based perspectives on human syntactic processing, the model not only recovers predictions of those prior theories where they are correct but also predicts previously undocumented interactions between memory limitations and probabilistic expectations, which we confirmed in three behavioral experiments probing human processing of recursive structures.

Our results reveal that the well-documented difficulty of integrating long linguistic dependencies, which is at the heart of existing memory-based models (5, 7, 36), is substantially modulated by probabilistic expectations: The comparison between the ONE and THREE conditions shows that such locality effects can be weakened or even reversed when the nonlocal syntactic structure has high a priori probability, a prediction that falls out naturally from our proposed unification of memory- and expectation based perspectives. 

Our work further documents three prominent families of effects from the psycholinguistic literature in a single experiment and with a single model: locality effects (increased difficulty of THREE), predictability effects (effect of embedding bias in the ONE condition), and semantic interference effects (effect of semantic compatibility). 

There has been considerable interest in a unified theoretical treatment of these families of effects; our work showcases how a single model can describe, in detail, how they interact. One group of phenomena not targeted by our experiments is similarity-based interference (43, 44). Investigating whether it can also be accounted for with this modeling framework is an interesting problem for future research.

Our resource-rational model is formally related to models in various domains. Classical work has shown that rational analysis of retention probabilities can account for fundamental properties of human memory (28, 29). Recent work (45–48) has formalized rational models of human working memory in some domains, such as visual working memory, using rate–distortion theory, an information-theoretic framework deriving high-fidelity encodings under resource constraints. 

The key difference between rate–distortion theory and our model is that the measure of the economy is the fraction of available words here, while it is the number of encoded bits in rate–distortion theory. Applied to sentence comprehension, rate–distortion theory would lead to fully compressed "gist" representations of past context. Such fully compressed representations do not lead to the difficulty patterns observed in our experiments (see SI Appendix, section S8 for details). 

On the other hand, our model is also a simplification in that it models the recent context as a sequence of words, which may underestimate the role of memory representations of the longer context where individual words may have been forgotten but the memory of meaning remains. Further advances in machine learning may allow inferring a more sophisticated format of memory representations from resource-rational optimization.

In computer science, recursive structure is typically processed using stack-based data structures. Correspondingly, early models of human syntactic processing assumed bounds on the size of the stack, or the number of nodes that can be held in memory at the same time (2, 24). 

Such models predict that deeper embedding is more difficult, but do not predict that difficulty is modulated by statistical or semantic cues. Unlike stack-based architectures, our theory assigns a major role to probabilistic cues in establishing recursive structure. In this respect, it agrees with more recent memory-based theories assuming that humans do not maintain data structures such as stacks, and, instead, establish syntactic structures using associative cue-based retrieval (5, 7, 49, 50). Models of associative retrieval as currently implemented (7) do not account for the distinctive difficulty patterns predicted by our model and observed in our experiments. Nonetheless, we view our theory as compatible with ideas from that literature. 

Our theory provides a computational-level model that makes predictions compatible with existing memory-based models, but- unlike those models-is rationally attuned to the rich statistical structure of language, enabling it to predict how memory limitations interact with probabilistic expectations. Our results suggest that identifying probabilistic versions of associative retrieval models, as algorithmic-level implementations of the resource-rational model described here, is an interesting problem for psycholinguistic research. See SI Appendix, section S7.2 for more on the implications of our results for retrieval-based memory models.

Our proposed unification of expectation-based and memory-based models rests on the idea that imperfect working memory representations are reconstructed rationally-although sometimes incorrectly-using knowledge of the statistics of the language. This idea has an important precedent in work on redintegration in verbal working memory (e.g., refs. 51–55), a process whereby degraded short-term memory is restored using knowledge from long-term memory. This has been applied to memory for word lists (e.g., ref. 52–55) and, more recently, memory for syntactic patterns (56). Our model provides an account of such processes grounded in Bayesian inference constrained by resource rationality. There are also models where working memory is treated not as a component of memory of its own but as emergent from the interaction of processing and long-term memory (57, 58). For such models, our results provide data on how long-term knowledge informs processing.

Our experiments capitalize on statistical correlates of syntactic structures to probe how probabilistic expectations interact with memory constraints. This has some parallels in prior work on expectation-based models that showed how correlations, such as between animacy and relative clause type, impact processing in ways not accounted for by existing memory-based accounts (e.g., refs. 59–61). Our work expands on this line of work by articulating an implemented theory of the interaction between memory constraints and probabilistic expectation.

Our model has a free parameter δ, the average number of retained words. We assumed a single value in deriving predictions and comparing them to human reading times. Fitting it for individual subjects and understanding its relationship to established measures of individual differences is an interesting problem for future research.

Connectionist models of human syntactic processing (8, 62– 64) aim to describe human processing using expectations derived from neural network representations and have been proposed to model effects related to both memory limitations and probabilistic expectations. However, the differences between plain surprisal as computed by GPT-2 and resource-rational lossy-context surprisal show that human-like memory limitations need not emerge automatically in connectionist models.

We have shown how a model of resource-rational language processing can be scaled to the rich statistical structure of natural language. Our machine learning–based method may open the door to fitting sophisticated rational models on natural input statistics and also in other domains of human cognition.

The generality of our model also suggests that similar phenomena might exist outside of language: Whenever humans process input that is too complex for all its parts to be attended to simultaneously, processing should be impacted by the statistical structure of similar inputs.

improve brain

Materials and Methods

Nouns. We collected nouns that can take a sentential complement, using the Penn Treebank (65), the English Web Treebank (66), the AnCoRA treebank (67) in Spanish, and the HDT Treebank (68) in German. We estimated embedding bias as the log probability that "the NOUN" was followed by "that" using English Wikipedia (2.3 billion words),  German Wikipedia (800 million words), and Spanish Wikipedia (500 million words). See SI Appendix, section S11 for details. We validated the English estimates using two other large corpora of American and British English (SI Appendix, section S10.1).

Model. Resource-rational lossy-context surprisal is defined by a family of retention probabilities θ = {qw, i: i, w}, where w ranges over words and i = 1, ..., N, where N = 20 is the maximum context length considered, long enough to accommodate all contexts appearing in the experiments. We parameterize qw,  I using a neural network that combines a past word's identity and the number of intervening words, to output a retention probability (SI Appendix, section S1.1). The model θ gives rise to the likelihood p(c |c) and thus the posterior p(c|c ). It is chosen to minimize average next-word surprisal for the resulting next-word posterior p(w|c ):

improve working memory

Experimental Setup for Reading Time Studies. For all studies, the experimental protocol was approved by the Institutional Review Board at Stanford University. Informed consent was obtained from all participants. 

Each participant was presented with 10 critical trials. In both experiments, two trials were in ONE, and four trials were in TWO and THREE each. In experiment 2, half of the TWO and THREE trials were each in the COMPATIBLE (INCOMPATIBLE) condition. We chose a small number of critical trials, to minimize any effect of statistical adaptation to center embeddings during the task. 

To maximize statistical precision, we selected 15 nouns with very high embedding bias and 15 nouns with very low embedding bias (SI Appendix, Fig. S36). For each participant, we sampled five nouns with high embedding bias and five nouns with a low value and matched these with the 10 critical trials. For each participant, we also sampled 30 fillers from a pool of 56 fillers from a prior reading time study of center embeddings (42). 

To remove semantic anomalies due to presupposition violations (e.g., "the fact was wrong"), we classified the nouns into entailing (e.g., "fact"), nonentailing neutral (e.g., "claim"), and nonentailing negative (e.g., "accusation") nouns, and classified items for compatibility with each of these three classes (SI Appendix, section S11). For each participant, we matched the 10 nouns with semantically compatible items.

For the maze task, we generated distractors automatically (39) using the Gulordava language model (69): these distractors have extremely low contextual probability while being matched with the target word in frequency and length. Distractors were matched across conditions, except within the second-to-last verb phrase in the (IN)COMPATIBLE conditions in experiment 2. In particular, distractors were matched on the critical word across all conditions.

When participants made a mistake (i.e., chose the distractor), they were prompted to retry the current word (70). Reaction times on such trials were excluded; this choice did not impact conclusions (SI Appendix, section S3.6).

For each subject, trials were presented in random order so that no two critical trials were adjacent. Participants, recruited on the Prolific academic platform, took a median of 13 min and received £2.20 (≈3 USD).

Data Analysis for Reading Times. We excluded trials 1) with an incorrect answer, 2) from participants who made errors on more than 20% of words, and 3) below or above 99% of all reading times. See SI Appendix, section S3.6 for robustness to condition 1, and see SI Appendix, section S3.7 for robustness to condition 3. We then analyzed log-transformed reading times on the final verb using Bayesian mixed-effects models implemented in Stan (71) using arms (72). See SI Appendix, section S3.3 for priors and robustness to prior choices. We used contrast coding with the presence of a "that"-clause (ONE vs. TWO/THREE), depth (TWO vs. THREE), and compatibility manipulation (COMPATIBLE vs. INCOMPATIBLE) as contrasts. Embedding bias was centered, and all non vacuous binary interactions were added as fixed effects (SI Appendix, section S3.2). 

We included the maximal random effects structure justified by the experimental design, entering items, nouns, and participants as random effects. To estimaeffects in raw reading times (milliseconds), we first computed the predicted log transformed reading time in both conditions (e.g., COMPATIBLE and INCOMPATIBLE), then transformed both into milliseconds by exponentiating, and computed the difference (see SI Appendix, section S3.4 for further details). In Fig. 3, we plot the posterior mean of the predicted reading time in all conditions for nouns with embedding bias matching "fact" or "report." Error bars represent the posterior SD.

Details for Production Study. We constructed 28 items of the form "The XXX that the diplomat who the senator," and selected 12 nouns, 6 each with very high or very low embedding bias. For each participant, we randomly paired items and nouns. The 12 critical trials were presented in random order with 27 fillers. A linguist manually annotated, for each completion provided, whether the correct number of verb phrases (three) was produced. The annotator was blind to the identity of the noun.

In Spanish and German, we selected 20 nouns with very high or very low embedding bias in each language, sampling 6 high and 6 low embedding bias nouns for each participant. As in the English version, we randomly matched 12 items with the 12 sampled nouns for each participant. Fillers were translated from the English experiment.

In German, we further constructed 12 matrix sentences (e.g., "Klaus said that"), and randomly matched them with items and nouns for each participant. We conducted a Bayesian trial-by-trial logistic mixed-effects analysis with embedding bias as a fixed effect, and random effects of nouns, items, participants, and (in German) matrix sentences. See SI Appendix, section S4 for details.

Data, Materials, and Software Availability. Fitted retention probabilities and model predictions have been deposited in Zenodo (https://zenodo.org/ record/6602698) (73), (https://zenodo.org/record/6988696) (74). Anonymized reading times, language production data, and source code have been deposited in GitLab (https://gitlab.com/ m-hahn/resource-rational-surprisal) (75).

ACKNOWLEDGMENTS. We thank the editor and the reviewers for their constructive feedback, which helped improve the manuscript. We are also grateful to Judith Degen, Tiwalayo Eisape, Hailin Hao, Jennifer Hu, Dan Jurafsky, Peng Qian, Cory Shain, Shravan Vasishth, Tom Wasow, Ethan Wilcox, and the audience at the 2020 CUNY Conference on Sentence Processing for helpful discussion and feedback.

improve memory


Reference

1. N. Chomsky, Syntactic Structures (Mouton, The Hague, 1957). 

2. G. A. Miller, N. Chomsky, "Finitary models of language users" in Handbook of Mathematical Psychology, R. D. Luce, R. R. Bush, G. Galanter, Eds. (John Wiley, 1963), pp. 269–321. 

3. L. Frazier, "Syntactic complexity" in Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives, D. R. Dowty, L. Karttunen, A. M. Zwicky, Eds. (Cambridge University Press, New York, 1985), pp. 129–189.
4. E. Gibson, Linguistic complexity: Locality of syntactic dependencies. Cognition 68, 1–76 (1998). 

5. B. McElree, S. Foraker, L. Dyer, Memory structures that subserve sentence comprehension.J. Mem. Lang. 48, 67–91 (2003). 

6. W. Tabor, B. Galantucci, D. C. Richardson, Effects of merely local syntactic coherence on sentence processing.J. Mem. Lang. 50, 355–370 (2004). 

7. R. L. Lewis, S. Vasishth, An activation-based model of sentence processing as skilled memory retrieval. Cogn. Sci. 29, 375–419 (2005). 

8. M. H. Christiansen, M. C. MacDonald, A usage-based approach to recursion in sentence processing. Lang. Learn. 59, 126–161 (2009). 

9. J. Hale, (2001) "A probabilistic early parser as a psycholinguistic model" in Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2001, L. Levin, K. Knight, Eds. (Association for Computational Linguistics, Stroudsburg, PA), pp. 1–8. 

10. R. Levy, Expectation-based syntactic comprehension. Cognition 106, 1126–1177 (2008). 

11. K. Rayner, A. D. Well, Effects of contextual constraint on eye movements in reading: A further examination. Psychon. Bull. Rev. 3, 504–509 (1996). 

12. A. Staub, The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation.Lang. Linguist. Compass 9, 311–327 (2015).


For more information:1950477648nn@gmail.com



You Might Also Like