Part 1:Context Memory Encoding And Retrieval Temporal Dynamics Are Modulated By Attention Across The Adult Lifespan

ali.ma@wecistanche.com

Soroush Mirjalili, Patrick Powell, Jonathan Strunk, Taylor James, and Audrey Duarte

https://doi.org/10.1523/ENEURO.0387-20.2020 Department of Psychology, Georgia Institute of Technology, Atlanta, GA 30318

Cistanche-improve memory7

Cistanche tubulosa dosage for improving memory

Abstract

Episodic memories are multidimensional, including simple and complex features. How we successfully encode and recover these features in time, whether these temporal dynamics are preserved across age, even under conditions of reduced memory performance, and the role of attention on these temporal dynamics is unknown. In the current study, we applied time-resolved multivariate decoding to oscillatory electroencephalography (EEG) in an adult lifespan sample to investigate the temporal order of successful encoding and recognition of simple and complex perceptual context features. At encoding, participants studied pictures of black and white objects presented with both color (low-level/simple) and scene (high-level/complex) context features and subsequently made context memory decisions for both features. Attentional demands were manipulated by having participants attend to the relationship between the object and either the color or scene while ignoring the other context feature. Consistent with hierarchical visual perception models, simple visual features (color) was successfully encoded earlier than were complex features (scenes). These features were successfully recognized in the reverse temporal order. Importantly, these temporal dynamics were both dependent on whether these context features were in the focus of one’s attention, and pre-served across age, despite age-related context memory impairments. These novel results support the idea that episodic memories are encoded and retrieved successively, likely dependent on the input and output pathways of the medial temporal lobe (MTL), and attentional influences that bias activity within these pathways across age.

Keywords: aging; attention; context memory; episodic memory; multivariate pattern analyses

Significance Statement

The events we learn and remember in our lives consist of simple context details like color and more complex ones like scenes. Whether we learn and recognize these memory details successively or simultaneously, and whether attending to some features but not others impacts when we encode and retrieve them is unknown. Using high temporal resolution neural activity patterns, we found color details were successfully encoded earlier than scene ones but recognized in the reverse order. Importantly, these temporal dynamics depended on which feature was in the focus of one's attention and were preserved across age. These findings elucidate the successive manner in which the features that constitute our memories are encoded and retrieved and the conditions that impact these dynamics.

Introduction

Numerous episodic memory studies have investigated the neural underpinnings of successful encoding and retrieval of different kinds of context features including color, spatial, and various semantic attributes (Uncapher et al., 2006; Awipi and Davachi, 2008; Duarte et al., 2011; Staresina et al., 2011; Park et al., 2014; Liang and Preston, 2017). Although several regions support successful episodic encoding and/or retrieval regardless of the nature of the context features, others are content-selective. Little is known about the time course with which different context features are successfully encoded and retrieved.

Why would the temporal dynamics of successful context encoding and/or retrieval be impacted by context feature type? Numerous perception studies have established that simple features like color are discriminated against earlier in time and by earlier visual cortical regions than more complex features like scenes (Carlson et al., 2013; Kravitz et al., 2014; Clarke et al., 2015). Some regions supporting feature perception also support successful encoding of the features to which they are sensitive (Hayes et al., 2007; Awipi and Davachi, 2008; Preston et al., 2010; Dulas and Duarte, 2011). It is, therefore, possible that simple context features may be successfully encoded into memory before complex ones.

Context features may not be retrieved in the same order in which they are perceived. In one recent study researchers used multivariate pattern analyses (MVPAs) of electroencephalography (EEG) activity to decode the times at which perceptual and high-level conceptual information was discriminated and later reconstructed from memory (Linde-Domingo et al., 2019). Consistent with feed-for-ward visual processing hierarchies (Carlson et al., 2013; Kravitz et al., 2014), perceptual details were discriminated earlier than were more complex, conceptual ones. Interestingly, these temporal dynamics were reversed during recall. These results, together with intracranial EEG evidence showing reversed information flow within the medial temporal lobe (MTL) between encoding and retrieval (Fell et al., 2016), support the idea that remembering may proceed in reversed order from perception.

The reversal of information flow between perception and remembering is intriguing, but several questions remain. First, it stands to reason that simple features that are perceived earlier would also be successfully encoded into memory earlier than those perceived later. If complex This work was supported by the National Science Foundation Grant 1125683 (to A.D.), the Ruth L. Kirschstein National Research Service Award Institutional Research Training Grant from the National Institutes of Health National Institute on Aging Grant 5T32AG000175, and National Institute on Aging 1R21AG064309-01.

Acknowledgments: We thank all of our research participants.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution, and reproduction in any medium provided that the original work is properly attributed.

features are reactivated earlier than simple ones (Linde- Domingo et al., 2019), one’s ability to successfully recognize a complex feature should also occur earlier. Second, normal aging is associated with neurocognitive slowing (Salthouse, 1996), with EEG and MEG studies showing processing delays for multiple neural components (Onofrj et al., 2001; Zanto et al., 2010; Clarke et al., 2015). Whether this slowing might also be observed for the time courses of simple and complex context feature encoding and/or retrieved is unknown. Third, in real-world situations, one’s attention may be directed to the processing of some features over others. If attention is directed to high-level episodic features over low-level ones, for example, it is not clear that low-level features would be prioritized to the same extent during encoding. Indeed, ample evidence from event-related potential (ERP) studies of attention show earlier ERP latencies for attended than unattended visual stimuli (Hillyard and Anllo-Vento, 1998; Woodman, 2010).

Here, we investigated the time courses of successful encoding and recognition of simple and complex perceptual features, and how attention might impact these temporal dynamics across the adult lifespan. Attentional demands were manipulated by having participants attend to the relationship between an object and either color or scene while ignoring the other context feature. For both encoding and retrieval, we trained multivariate pattern classifiers to distinguish successful from unsuccessful context memory separately for color and scene features from oscillatory EEG. We assessed context memory classification ac- curacy through time for each feature as a function of whether or not they were attended to during encoding. We explored the fit of our data to one of three models (Fig. 1).

If the feed-forward processing hierarchy at encoding and reversed temporal dynamics at retrieval are unalterable and independent of one’s current goals, we predict that results will fit the hierarchical model, across age. However, if attention modulates these dynamics, we predict that the fit to either the attention or hybrid models will be reduced with age; as the ability to selectively attend to task-relevant features is reduced with age (Hasher and Zacks, 1988; Campbell et al., 2010). Any attention modulation on the temporal dynamics should be reduced, potentially contributing to age-related context memory impairments (James et al., 2016a; Powell et al., 2018).

Cistanche-improve memory8

Materials and Methods

Participants

The participants consisted of 52, right-handed adults (21 women) from ages 18 to 74. Data from an additional five older adults (61–76years) were excluded: two for lack of understanding of task procedures, two for noisy EEG (i.e., DC drift, movement), and one for computer malfunction. Data from one young adult (21years) were excluded because of noisy EEG. A subset of the young and older, but not middle-aged, adults’ data were included in prior published studies examining different research questions (James et al., 2016b; Strunk et al., 2017; Powell et al., 2018). All subjects were native English speakers and had

Figure 1. Three hypothesized model fits for low-order feature (color) and high-order feature (scene) context encoding and retrieval temporal dynamics. Predicted data for each model represent the earliest local classification peaks of context memory success de-coding (correct vs incorrect) across the participants. In the hierarchical processing model (a), low-level context features, in this case color, processed by earlier visual cortical areas, are encoded before and retrieved following high-level ones, in this case, scene, re- gardless of whether they were attended to during encoding. For this model, if the scene were the target context, the encoding and retrieval histograms would be identical to those shown. Alternatively, in the attention-based processing model (b), the attended context feature will be encoded and retrieved earlier than the feature which is ignored. As shown in part b, color encoding precedes scene encoding when color is the attended “target” feature, and the same temporal dynamic would hold at retrieval. If the scene were the target, the order of the histograms would be reversed from those shown. Lastly, in the hybrid processing model (c), the temporal dynamics of encoding and retrieval are based both on the complexity of the context features, and whether they were the target or the distractor. In the example in part c, scene encoding follows color encoding by a longer delay when color is the target than when it is the distractor, while scene retrieval precedes color retrieval by a shorter delay. If the scene were the target, the distance between the peaks would decrease for encoding and increase for retrieval compared with what is shown.

normal or corrected vision. Participants were compensated with course credit or $10/h and were recruited from the Georgia Institute of Technology and surround- ing community. None of the participants reported any neurologic or psychiatric disorders, vascular disease, or use of any medications that impact the central nervous system. Participants completed a battery of standardized neuropsychological tests that consist of subtests from the memory assessment scale (Williams, 1991), including list learning, recognition, verbal span forward and backward, immediate and delayed recall, visual recognition, recall, reproduction, and delayed recognition. Participants that scored .2 SDs outside the sample mean were excluded. Moreover, older adults were administered the Montreal cognitive assessment (MoCA; Nasreddine et al., 2005) to test further for mild cognitive impairments. Only participants scoring a 26 or above for the MoCA were included. All participants signed consent forms approved by the Georgia Institute of Technology Institutional Review Board.

Materials

A total of 432 grayscale images of objects were selected from the Hemera Technologies Photo-Object DVDs and Google images. At encoding, 288 of these ob- jects were presented; in half of the trials, participants’ at- tention was directed to a color and in the other half directed to a scene. Each grayscale object was presented in the center of the screen and a color square and scene were presented on the left or right of the object. For all tri- als in a block, the same context feature type was pre- sented on the same side of the object. Piloting showed that this minimized participant confusion and eye move-

ment artifacts. The locations of these context features

were counterbalanced across blocks so that they were shown an equal number of times on the right-hand and left-hand side of the object in the center. For each encoding trial, participants were instructed to focus on associations between the object and either the colored square or the scene, which served as the target context for that trial. The potential scenes included a studio apartment,

Figure 2. Experimental design. During study, participants were asked to make a subjective yes/no assessment about the relationship between the object and either the colored square (i.e., “is this color likely for this object?”), where one of three possible colors was presented (red, green, brown) or the scene (i.e., “is this object likely to appear in this scene?”), where one of three possible scenes was presented (cityscape, studio apartment, island). Participants were directed to pay attention to one context and ignore the other context. During test, participants made up to three responses for each test trial (item recognition, and color and scene context memory decisions).

cityscape, or island. The scenes were taken from Creative Commons. The potential colored squares consisted of green, brown, or red. Each of the context and object pictures spanned a maximum vertical and horizontal visual angle of ;3°. During retrieval, all 288 objects were included in the memory test in addition to 144 new object images that were not presented during encoding. Study and test items were counterbalanced across subjects.

Experimental design and statistical analyses

Figure 2 illustrates the procedure used during the study and test stages. Before the beginning of each phase, participants were provided instructions and given

10 trials for practice. For the study stage, participants were asked to make a subjective yes/no assessment about the relationship between the object and either the colored square (i.e., “is this color likely for this object?”) or the scene (i.e., “is this object likely to appear in this scene?”). Instructions for the task specified that on any specific trial, the participant should pay attention to one context and ignore the other context. Within the study phase, there were four blocks where each block consisted of four mini-blocks and each of them included 18 trials. In advance of beginning each mini-block, participants were provided a prompt (e.g., “Now you will as- sess how likely the color is for the object” or “Now you will assess how likely the scene is for the object”). Since prior evidence has suggested that memory performance in older adults is more disrupted when they have to switch between two distinct kinds of tasks (Kray and Lindenberger, 2000), mini blocks were used to orient the participant to which context they should pay attention to in the upcoming trials. Moreover, it decreases the task demands of having to switch from judging one context (e.g., color) to judging the other (e.g., scene). Each trial in a mini block had a reminder prompt presented below the pictures during study trials (Fig. 2).

During test, participants were presented with both old and new objects. Similar to the study phase, each object was flanked by both a scene and a colored square. For each object, the participant initially decided whether it was an old or a new image. If the participant detected the object was new, the next trial began after 2000ms. If par- ticipants stated that it was old, then they were asked to make two additional assessments about each context feature and their certainty about their judgment (i.e., one about the colored square and another about the scene). The order of the second and third questions was counter- balanced across participants. For old items, the pairing was set so that an equal number of old objects were pre- sented with: (1) both context images matching those pre- sented at encoding stage, (2) only the color matching, (3) only the scene matching, and (4) neither context image matching. Responses to the context questions were made on a scale from adults finished all four study blocks before the four test blocks. For older adults (over 60), to better equate item memory performance with young adults and to allow us to explore age effects in the EEG temporal dynamics uncon- founded by large age effects in general memory ability (Rugg and Morcom, 2005), the memory load was halved so that they finished a two-block study-test cycle twice (two study, two test, two study, two test). All participants completed a short practice of both the study and test blocks before starting the first study block. Thus, partici- pants knew of the upcoming memory test although they were not told to focus on their encoding decisions and not to memorize for the upcoming test.

Data collection

Continuous scalp-recorded EEG data were recorded from 32 Ag-AgCl electrodes using an ActiveTwo amplifier system (BioSemi). Electrode position is based on the ex- tended 10–20 system (Nuwer et al., 1998). Electrode posi- tions consisted of: AF3, AF4, FC1, FC2, FC5, FC6, FP1, FP2, F7, F3, Fz, F4, F8, C3, Cz, C4, CP1, CP2, CP5, CP6, P7, PO3, PO4, P3, Pz, P4, P8, T7, T8, O1, Oz, and O2. External left and right mastoid electrodes were used for referencing offline. Two additional electrodes recorded horizontal electrooculogram (HEOG) at the lateral canthi of the left and right eyes and two electrodes placed supe- rior and inferior to the right eye recorded vertical EOG (VEOG). The sampling rate of EEG was 1024Hz with 24- bit resolution without high or low pass filtering EEG preprocessing

Offline analysis of the EEG data was performed in MATLAB 2015b using the EEGLAB, ERPLAB, and FIELDTRIP toolboxes. The continuous data were down sampled to 256Hz, referenced to the average of the left and right mastoid electrodes, and band pass filtered be- tween 0.5 and 125Hz. The data were then epoched from–1000ms before stimulus onset to 3000ms. The time range of interest was from stimulus onset to 2000ms, but a longer time interval is required to account for signal loss at both ends of the epoch during wavelet transformation. Each epoch was baseline corrected to the average of the whole epoch, and an automatic rejection process deleted epochs in which a blink occurred during stimulus onset or epochs with extreme voltage shifts that spanned across two or more electrodes. The automated rejection proc- esses identified epochs with the following parameters in the raw data. (1) The voltage range was greater than 99th percentile of all epoch voltage ranges within a 400-ms time interval (shifting in 100-ms intervals across each epoch). (2) The linear trend slope was higher than the 95th percentile of all epoch ranges with a minimum R2 value of 0.303) The voltage range was larger than 95th percentile of all epoch voltage ranges within a 100-ms time interval (shifting in 25-ms intervals across each epoch), between 150 and 150ms from stimulus onset for frontal and eye electrodes only. Then an independent component analy- sis (ICA) was run on all head electrodes for identifying oc- ular artifacts (i.e., blinks and horizontal eye movements). Components related to ocular artifacts were omitted from the data by visually inspecting the topographic component maps and component time course with the ocular electrodes. Each epoch was re-baselined to the – 300 to –100-ms time period before stimulus onset since the epochs were no longer baselined to a specific time period after deleting components related to ocular activ- ity. If a dataset had a noisy electrode (e.g., .30% of the data required to be rejected), it was deleted from the processing stream and interpolated using the nearby channels to estimate the activity within the bad channel before running the time frequency procedure. After all processing stages, ;13% (SD=8%) of the epochs were removed.

Frequency decomposition

Each epoch was transformed into a time frequency rep- resentation using Morlet wavelets with 78 linearly spaced frequencies from 3 to 80Hz, at five cycles. During the wavelet transformation, each epoch was decreased to the time interval of interest and down sampled to 50.25Hz. For the following MVPAs, we examined only trials in which participants correctly recognized objects as old (item hits). The decision to select only item hit trials was based on the assumption that correct recognition of the associ- ated contexts was contingent on correct recognition of the centrally-presented object. The average number of tri- als for younger, middle-aged, and older adults are as fol- lows: younger (M=190.50, SD=41.01), middle-aged (M=187.31, SD=40.24), older (M=177.06, SD=38.56).

Time-resolved classification

We were interested in classifying the earliest time at which color and scene context features were successfully encoded and retrieved. In order to maximize the number of trials available to train the classifier, we collapsed across confidence levels for both correct and incorrect trial types at both encoding and retrieval. That is, some participants had very few trials for specific confidence conditions (e.g., correct context with high confidence) making it difficult to include confidence in classification analyses including all participants. Similarly, for retrieval, we collapsed across all trial types (i.e., both context im- ages matching those presented at encoding stage, only the color matching, only the scene matching, and neither context images matching) to increase power to detect the effects of interest. It is important to note that the propor- tions of these trial types were roughly equivalent for con- text correct and incorrect trials (context correct: 29.5% both contexts match, 23.2% only color match, 22.1% only scene match, 25.2% neither context match; context incorrect: 20.7% both contexts match, 27.5% only color match, 28.0% only scene match, 23.8% neither context match). These proportions were roughly similar for the dif- ferent attention conditions (i.e., attend color vs attend scene). For each classification analysis, we selected a specific 300-ms sliding time interval and shifted the time window by one time point (20ms) over the initial 2-s pe- riod of the encoding and the item memory portion retrieval epochs (i.e., starting at stimulus onset at both encoding and retrieval). This 300-ms time interval was chosen to maximize information available for the classifier to sepa- rate correct from incorrect trials while also allowing for sufficient temporal resolution to detect peak latency differences between conditions. The first 2 s was chosen for classification analysis to be consistent with previous EEG studies, including ones using this same task, showing epi- sodic memory effects within this time range (Rugg and Curran, 2007; James et al., 2016a; Powell et al., 2018). That is, even during the item recognition period, EEG activ- ity is sensitive to context memory accuracy. Second, sam- pling of later time periods of the trial produced similar and/ or less significant effects than those presented. Third, be- cause the color and scene context recognition questions were presented and responded to later in the trial, we aimed to reduce the potential influence of color and scene perception on memory success effects. Subsequently, for each 300ms interval, we extracted features based on com- mon spatial patterns (CSPs) from the data at each fre- quency band separately, including d (3–4Hz), u (4–7Hz), a (8–14Hz), b (14–30Hz), and g (30–80Hz). The CSP algo- rithm aims to increase the discriminability by learning spa- tial filters which maximize the power of the filtered signal and minimize the power for the other class (Herbert et al., 2000). Briefly, the average covariance matrices of the tri- als of each class are computed, producing C1 and C2 for the two classes. Subsequently, using the concept of eigen value decomposition, an optimization problem of w ¼ argmax is solved to find the optimum spatial filters. In other words, the spatial filters optimally project the signals of the current space (i.e., across original elec- trodes) into a new space in which the signal at each pro- jected electrode is a linear combination of the signals across all original electrodes and the variances of these signals is highly discriminable for the trials of the two classes (i.e., context correct vs context incorrect). Next, once the spatial filters across different frequency bands were extracted separately, we applied Fisher’s criteria to select the best features for each individual to reduce the feature space for training the classifier (Phan and Cichocki, 2010). To be consistent across all analyses and participants, and to avoid the risk of overfitting and underfitting based on the number of trials, we selected the best five features with the highest Fisher scores for each analysis. Finally, we trained a naive Bayesian clas- sifier to distinguish the correct from incorrect context trials (Fukunaga, 1993). We used 5-fold cross-validation average accuracy as our criteria for evaluating the clas- sifier’s performance. As a result, for each participant, we obtained one classifier accuracy value for each of the 86, 300-ms intervals (with the resolution of 20-ms sliding timepoints i.e., [0, 300 ms], [20, 320 ms], [40, 340 ms],..., [1700, 2000 ms]) for each phase of the ex- periment (encoding, retrieval), attention condition (target, distractor), and context feature (color, scene). While the the- oretical chance level for binary classification problems is 50%, there are some studies that have shown the true level of chance performance can be remarkably differ- ent from the theoretical value (Combrisson and Jerbi, 2015; Jamalabadi et al., 2016). As a result, we used per- mutation tests (Nichols and Holmes, 2002) by repeating the classification analysis to obtain an empirical null

distribution for the classifier performance. To be more specific, for each separate analysis and participant, we conducted the same time-resolved 5-fold cross-validation classification procedure as for the real data with true labels but used labels that were randomly shuffled at each repetition. This process was conducted 500 times per participant for each of the classification analy- sis with random label assignment on each repetition. This established an empirical null distribution of classifi- cation performancescores. Subsequently, weset theaccu- racy, which was higher than 95% of the performance values in the null distribution, as the threshold for determining the significance of a classifier’s performance for each subject. But it is important to note that each time interval will have its own empirical null distribution, and the 95th percentile for the null distribution is different across different time intervals, and to be more conservative, we have selected the highest 95th percentile across the time intervals as the threshold for that subject and analysis.

In order to show that classification performance is sig- nificantly above chance across subjects, and to show general time periods of memory success decodability through time, we subtracted the time course of each par- ticipant’s empirical chance level from the individual’s ac- tual classification performance time course. We then averaged these difference time courses across the attend color and attend scene conditions. Finally, we averaged these individual difference time courses across partici- pants. These across participant, average real-chance classification time courses for encoding and retrieval and 95% confidence intervals are shown in Figure 3. As can be seen in Figure 3, classification performance was signif- icantly greater than chance, across subjects, for much of the encoding and retrieval time intervals. Context memory success was maximally decodable between 680–980ms at encoding (midpoint of 830ms; Fig. 3a) and between 340 and 640ms at retrieval (midpoint of 490ms; Fig. 3b).

Finally, we plotted the classifier accuracy values on a diagram where each point on the diagram (Fig. 4) repre- sented the midpoint of each 300-ms time interval. In each of these diagrams, there would be multiple time intervals whose classification accuracy are higher than the adja- cent time intervals (i.e., the time intervals right before and right after the current time interval, with 20-ms midpoint difference). However, since there would be many mo- ments that qualify for this criterion, we expanded the adja- cency interval to 60ms. To be more specific, only the time intervals that had higher classification accuracy than all of the time intervals within their 60-ms temporal neigh- borhood were selected as the potential peak moments. For instance, in Figure 4, while A has higher performance than the time intervals right before and after, it cannot be selected as a potential peak since B is in its defined neighborhood and has higher performance. Moreover, the selected peak moments should perform significantly above the chance level. As a result, any potential peak moments that had lower performance than the signifi- cance threshold would not be considered. Again, in Figure 4, B will not be considered since it has performed less than the empirical chance level. Lastly, if there were

Figure 3. The time course of actual-chance context memory success classification performance, averaged across attention conditions and participants at (a) encoding and (b) retrieval with the 95% confidence intervals. Each time point in these diagrams repre- sents the midpoint of the associated 300-ms time interval. Since the first time interval includes 0–300ms, the diagrams start from 150ms and end with 1850ms, the midpoint of the last time interval (1700, 2000ms). The gray area in each figure indicates the 95% confidence interval of the actual-chance context memory success classification performance across participants. If the gray area of a specific time point reaches 0%, the actual performance is not significantly different from chance, across participants, for the associated 300-ms time interval. For example, at encoding, the confidence interval associated with time point 1030ms, the midpoint of the 880–1180ms interval, reaches zero.

Figure 4. An example of the results of time-resolved context memory accuracy classification from a representative subject and clas- sification analysis. Each time point in the diagram represents the midpoint of the associated 300-ms time interval. Since the first time interval includes 0–300ms, the diagram starts from the midpoint of this time interval, as shown by the left vertical dashed line. Moreover, the diagram ends with the midpoint of the last time interval (1700, 2000ms) as shown by the right vertical dashed line. Note that the threshold is set as the highest 95th percentile value through time in the time-resolved null distribution for each subject to be more conservative (see Materials and Methods).

Figure 5. Item, color, and scene context memory discriminability.

multiple peaks that performed above the empirical chance level, the earliest would be selected as the “peak moment” that determined for the first time whether a context feature would be encoded/retrieved successfully. As can be seen in Figure 4, there are some peaks, including C, D, and E, that are qualified after both of the mentioned criteria, and we would select C as the peak moment in that particular analysis.

Code accessibility

The custom code that we used in this study is available from https://doi.org/10.17605/OSF.IO/FVUZX.

Data accessibility

The data and results that support the findings of this study are available from https://doi.org/10.17605/OSF.IO/ FVUZX.

Part 2:Context Memory Encoding And Retrieval Temporal Dynamics Are Modulated By Attention Across The Adult Lifespan

Neural Basis Of Working Memory in ADHD: Load Versus Complexity

Part 1:Context Memory Encoding And Retrieval Temporal Dynamics Are Modulated By Attention Across The Adult Lifespan

Abstract

Significance Statement

Introduction