Syntactic Chunking Reveals A Core Syntactic Representation Of Multi-digit Numbers, Which Is Generative And Automatic Part 1

Oct 26, 2023

Abstract

Representing the base-10 structure of numbers is a challenging cognitive ability, unique to humans, but it is yet unknown how precisely this is done. Here, we examined whether and how literate adults represent a number’s full syntactic structure. In 5 experiments, participants repeated number-word sequences and we systematically varied the order of words within each sequence. Repetition on grammatical sequences (e.g., two hundred ninety-seven) was better than on non-grammatical ones (hundred seven two ninety). 

The relationship between non-grammatical sequences and memory is very close. Non-grammatical sequences refer to sequences that do not follow fixed grammatical rules, including but not limited to numbers, letters, graphics, etc. Memory refers to the ability of the human brain, which refers to the ability of people to remember and quickly retrieve the required information through learning and training.

In life, we often need to remember a large number of numbers, graphics, and other non-grammatical sequences, such as phone numbers, credit card numbers, ID numbers, house numbers, etc. For these numerical sequences, we need to remember them by constantly repeating them. Moreover, the randomness and unpredictability of non-grammatical sequences, coupled with the repetition and exercise of memory, can greatly improve our memory and concentration.

In addition, non-grammatical sequence memory can also help prevent elderly diseases such as amnesia. Research has proven that by memorizing non-grammatical sequences such as numbers, letters, and graphics, the elderly can effectively exercise their brain's memory ability and prevent amnesia and other geriatric diseases. Therefore, it is also very important for middle-aged and elderly people to actively study and exercise to maintain memory.

In summary, non-grammatical sequences are closely related to memory. Through serious study and exercise, we can improve our memory and concentration abilities and better face the challenges of life and work. We should actively face the non-grammatical sequences in life and improve our memory through exercise to make our lives more fulfilling and beautiful. It can be seen that we need to improve our memory. Cistanche deserticola can significantly improve memory because Cistanche deserticola is a traditional Chinese medicinal material with many unique effects, one of which is to improve memory. The efficacy of minced meat comes from the various active ingredients it contains, including acid, polysaccharides, flavonoids, etc. These ingredients can promote brain health in various ways.

10 ways to improve memory

Click Know Short-term Memory how to improve

We conclude that the participants represented the number’s full syntactic structure and used it to merge number words into chunks in short-term memory. Accuracy monotonously improved for sequences with increasingly longer grammatical segments, up to a limit of words per segment, regardless of the number of digits, and worsened thereafter. 

Namely, short chunks improved memorization, whereas oversized chunks disrupted memorization. This chunk size limit suggests that the chunks are not based on predefined structures, whose size limit is not expected to be so low but are created ad hoc by a generative process, such as the hierarchical syntactic representation hypothesized in Michael McCloskey’s number-processing model. Chunking occurred even when it disrupted performance, as in the oversized chunks, and even when external cues for chunking were controlled for or were removed. We conclude that the above generative process operates automatically rather than voluntarily. To date, this is the most detailed account of the core representation of the syntactic structure of numbers—a critical aspect of numerical literacy and of the ability to read and write numbers. 

Keywords: 

Number syntax, Chunking, Symbolic numbers, Multi-digit number comprehension.

Significance statement

The ability to read and write numbers is a critical aspect of numerical literacy and a major predictor of elementary school math achievements. An underappreciated fact is that reading and writing numbers is also very hard: Even literate adults make many errors in these tasks, and about 8% never become good at it and have dysnomia, a prevailing learning disorder in number reading or writing. The central origin of these difficulties is the ability to handle the number’s syntactic structure, i.e., to combine digits or words into a multi-digit number or to decompose a multi-digit number into its elements. It is perhaps not surprising that syntax is the crux of the difficulty, as number syntax is hypothesized to reflect a more general ability, which is cognitively demanding and may be unique to humans, to represent complex structured information recursively or hierarchically. 

Here, we examined in detail this syntactic processing. We show that literate adults can form a cognitive representation of the syntactic structure of a whole number, even for numbers as long as 6 digits, and to do so they use an automatic process (as opposed to applying a learned strategy) that creates the syntactic representation in a step-by-step manner (as opposed to just retrieving a predefined representation). These conclusions can help improve how we teach numbers in elementary school, and how we identify and treat individuals with dysuria.

Introduction

Numerical literacy is extremely important in modern society. It is useful in everyday life, it is crucial for most academic and scientific disciplines, and it predicts academic achievements, unemployment, salaries, and mental and physical health (Duncan et al., 2007; Ritchie & Bates, 2013). There are many aspects to being proficient with numbers and mathematics, and a central one is the ability to read and write numbers. In elementary school, this skill turns out to be a main predictor of arithmetic abilities (Habermann et al., 2020). 

Later in life, most educated adults can read and write numbers accurately and without difficulties, but a surprisingly large number of people find it quite hard even as adults. For example, a recent study examined 120 literate adults and found that 9 of them (7.5%) had considerable difficulties in reading multi-digit numbers—they erred in more than 14% of the numbers they were asked to read (Dotan & Handelsman, in prep.). These people are likely to satisfy the criteria for dysnomia, a learning disorder that disrupts number reading (Dotan & Friedmann, 2018).

ways to improve memory

As it turns out, the difficulties in reading and writing numbers are not random but follow a consistent pattern, linking them to specific cognitive mechanisms of number processing. A central classification of the number processing mechanisms is into lexical processes, which handle the identity of each digit or number word, and syntactic processes, which handle the relations among lexical items. 

For example, identifying a digit or retrieving a number word are lexical processes, whereas detecting how many digits a number has, and the decimal role of each digit, are syntactic processes (Cappelletti et al., 2005; Cipolotti, 1995; Cipolotti et al., 1994; Deloche & Willmes, 2000; Dotan & Friedmann, 2018; Furumoto, 2006; McCloskey et al., 1986; Noël & Seron, 1993). Among these two, it is the syntax that poses the bigger challenge. Learning to process the syntax of numbers during childhood takes years to master and continues long after the lexical knowledge—the digits and the number-word names—was obtained (Cheung & Ansari, 2020; Dotan & Dehaene, 2016; Shalit & Dotan, 2022). 

Moreover, when reading numbers, children (Moura et al., 2013; Power & Dal Martello, 1990, 1997; Shalit & Dotan, 2022; Steiner et al., 2021) and adults (Dotan & Friedmann, 2018; Dotan & Handelsman, prep.) make more syntactic than lexical errors. Finally, the main reason for dysnomia, the learning disorder that disrupts number reading, is the inability to process the number’s syntactic structure properly: In a study that examined the locus of deficit for 40 randomly selected adults with dysuria, all except one were impaired in a syntactic process, whereas only 14 of them (35%) were impaired in a lexical process (some participants had both impairments; Dotan & Handelsman, in prep.).

Understanding the cognitive underpinnings of syntax, not only that of numbers but also in general is important not only for its real-world impact but also as a central theoretical question in cognitive psychology. Representing complex syntactic information, which encodes not only the identity of each item but also the relations among items, seems to be a considerable cognitive challenge in several different domains. Cognitive representations of syntactic relations exist in numbers; in language, to represent the grammatical inter-dependencies of the words in a sentence, (Chomsky, 1956); in arithmetic, to represent the hierarchical structure of algebraic expressions (Schneider et al., 2012; van de Cavey & Hartsuiker, 2016; Zeng et al., 2018); to represent the relational rules underlying arrays of shapes (Pothos & Bailey, 2000), sounds (Gentner et al., 2006; Horváth et al., 2001), spatial positions (Al Roumi et al., 2020), or other stimuli; and even to represent and plan motor action (Koechlin & Jubault, 2006; Moro, 2014). 

Some forms of syntax are simpler than others, but some syntactic representations—in particular, those organized as a hierarchy of elements—seem to be quite complex, and to a large extent—human-specific. Indeed, some animal species, e.g., songbirds (Berwick et al., 2011; Gentner et al., 2006), may be able to handle even relatively complicated syntactic structures, including some hierarchical structures, but only humans can flexibly handle complex hierarchical structures and combine them with their meaning, as we do in the case of language or numbers (Dehaene et al., 2015; Hauser et al., 2002). Understanding how people process the syntactic structure of numbers may potentially illuminate how humans process syntactic information in general.

What we already know about the processing of number syntax

“Number syntax” is not a unitary cognitive construct, handled by a single process—there are several different processes that handle different aspects of number syntax. We already know quite a bit about the low-level processes that handle highly specific syntactic aspects of numbers. These processes can be roughly classified according to the type of information being handled (digits versus number words) and the processing stage (input/comprehension versus production). In the digit-input mechanisms, i.e., when parsing a visually presented digit string, there are separate processes to handle the string length (how many digits it has), the positions of 0, the grouping of digits into triplets, and the relative order of digits (Cohen & Dehaene, 1991; Dotan & Dehaene, 2020; Dotan & Friedmann, 2018; Dotan et al., 2021b). In digit production mechanisms, i.e., when writing digit strings, dedicated processes handle the positioning of 0 (Furumoto, 2006) and the order of digits (Lochy et al., 2004). 

In the oral production of verbal numbers, specific processes handle the number words’ lexical classes (ones, tens, teens, etc.), which are essentially the syntactic aspect of the verbal number (Cohen & Dehaene, 1991; Dotan & Friedmann, 2018, 2019; McCloskey et al., 1986); other processes bind each digit with the appropriate lexical class (Blanken et al., 1997; Dotan & Friedmann, 2018); and yet other processes retrieve the morphological of corresponding with each lexical class (Cohen et al., 1997; Dotan & Friedmann, 2015). 

Finally, when comprehending a verbal number, specific syntactic processes handle the place-value information (Kallai & Tzelgov, 2012; Lambert & Moeller, 2019), the order of words (Hayek et al., 2020; Zuber et al., 2009), and the merging of adjacent pairs of number words into a single syntactic structure when this is grammatically possible (as in thirty-two, but not in two-thirty, Hung et al., 2015).

On top of these low-level syntactic processes, there exists a core representation of the number’s full syntactic structure. Namely, the number’s full syntactic structure is represented explicitly in the brain, and the human ability to handle number syntax is not just a by-product of other types of representations, e.g., some lower-level syntax-related processes. This representation, on which the present study focuses, was a central idea in the number-processing model of McCloskey and his colleagues (McCloskey, 1992; McCloskey et al., 1986). Specifically, they proposed that multi-digit numbers have a central abstract representation, which incorporates the full information about the number’s semantics and syntax. McCloskey’s model made an extreme assumption—that this representation incorporates both the number’s syntax and its semantics, and that it mediates any task involving any symbolic numbers (digits or words), including reading, writing, comprehension, production, and calculation. 

This extreme assumption was refuted (Campbell & Clark, 1992; Cohen & Dehaene, 1991, 2000; González & Kolers, 1982; Noël & Seron, 1997). The refutation has led several researchers to abandon McCloskey’s model in favor of other cognitive models of number processing—especially Dehaene’s triple-code model (Dehaene, 1992; Dehaene & Cohen, 1995; Dehaene et al., 2003), which focuses on the different representations of numbers and remains largely silent about the issue of number syntax and the differences between single-digit and multi-digit numbers. However, a recent study (Dotan et al., 2021a) supports a weaker version of McCloskey’s assumption. 

In this study, the participants heard, on each trial, a number between 1 and 9999 and responded by saying a random number in the same range. The syntactic structure of their responses was similar to that of the target numbers—a syntactic priming effect, which indicates that they represented the number’s syntactic structure. The researchers concluded that a representation of the number’s full syntactic structure exists—perhaps not for any number and in any task, but at least in some tasks and at least for numbers up to 4 digits long.

Another interesting idea in McCloskey’s (1992) number-processing model is that the syntactic representation of numbers has a hierarchical, tree-like structure: Te units and decades are merged first; then, this pair is merged with the hundreds (thereby forming a triplet), and finally, two triplets can be merged. For example, the number 234,567 would be represented as [2 & (3 & 4)] & [5 & (6 & 7)]. Such hierarchy resembles the way we represent sentences (Chomsky, 1956, 1995) and other types of information (Dehaene et al., 2015). At present, this hierarchical representation is still an unconfirmed hypothesis. As we shall see, the present study will bring several pieces of suggestive evidence in favor of this idea.

What we don’t yet know about the processing of number syntax

The aforementioned studies provide a relatively good picture of many peripheral syntactic processes—in particular, those involved in parsing the syntactic structure of sequences of digits or number words, and in the production of digit strings and multi-digit verbal numbers. In contrast, little is known about the core representation of number syntax. The present study aims to fill this gap: Our general goal was to identify several characteristics of a representation of the full syntactic structure of numbers and of the processes that create it.

Specifically, our first goal was to reaffirm the existence of a core representation of the syntactic structure of numbers. To our knowledge, to date, only a single study has shown that such a representation exists (Dotan et al., 2021a). Here, we will start by replicating this conclusion using another paradigm.

A second question concerns the feasibility of the syntactic representation. An influential idea in syntactic theory is that certain types of complex syntactic structures, which are unique to humans, are not predefined rigid cognitive structures; rather, they are created in a generative manner by operating recursively on the syntactic representation (Hauser et al., 2002). Here, we examined whether the syntactic representation of numbers is created dynamically by a generative process, or is a rigid predefined representation. 

memory enhancement

According to the former view, whenever we process a number, we recreate its syntactic structure in a generative step-by-step manner. This view is in excellent agreement with the notion that the syntactic structure of numbers is represented in a hierarchical tree-like manner (McCloskey, 1992; McCloskey et al., 1986). According to the second view, the number’s syntactic structure is a predefined memorized “template,” in which we embed the digits, and this representation is retrieved from a mental lexicon of number-syntax templates. The “lexicon of templates” view is not unlikely, especially given the small number of syntactic structures: For example, based on the common definition of syntactic structure as a series of number-word lexical classes (ones, tens, teens, etc.), English numbers with 1–3 digits have only 9 diferent syntactic structures: ones (e.g., for 5), tens (50), teens (15), tens ones (55), ones hundred (500), ones hundred ones (505), ones hundred tens (550), ones hundred teens (515), and ones hundred tens ones (555).

A third question pertains to the scope of the syntactic representation. In the single study that showed a core syntactic representation (Dotan et al., 2021a), the stimuli were Hebrew and Arabic verbal numbers up to 9999. Such numbers are limited in two ways. First, their syntactic structure is relatively simple. In spoken Hebrew and Arabic, numbers up to 9999 do not make use of the multiplier words “hundred” and “thousand” as English numbers do. Rather, ones, tens, hundreds, and thousands are four different lexical classes (e.g., in Hebrew, 3=/ shalosh/, three; 30=/shloshim/, thirty; 300=/shloshmeot/; 3000=/shloshtalafm/, and similar in Arabic; see Supplementary Material for additional details about the Hebrew verbal number system). Thus, in a number up to 9999, the different words always belong to different lexical classes—the same class never appears twice. Only numbers with 5 digits or more have the Englishlike hierarchical structure, in which the word “thousand” separates two similarly structured phrases (e.g., “twenty-three thousand forty-five”). It thus remains to be shown whether the numbers’ core syntactic representation can handle the hierarchy-like aspect induced by the multiplier words “hundred” and “thousand,” or is limited to the simpler forms of syntax.

The second limitation of Hebrew and Arabic numbers up to 9999 is that they have up to 4 words, so they can potentially fit in a single chunk in working memory (Cowan, 2001, 2010). Can the syntactic representation exceed the size of a single chunk in working memory? Arguably, the ability to transcend a single chunk is one important advantage of hierarchical representations.

A fourth and final question is whether number syntax is created automatically and without directed attention, similar to syntactic structures in several other domains, e.g., language and music (Batterink & Neville, 2013; Maidhof & Koelsch, 2011), or must it be created voluntarily, via a process that requires our intention and attention.

The four issues above were presented here as theory-driven questions, but they also have concrete pedagogical implications. For example, if syntactic structures are rigid templates (question 2), the best way to teach children the syntax of numbers may be by memorizing the list of templates, whereas if the syntax is generative, a better method may be to teach the generative syntactic rules. If the syntax is created via attention-requiring processes (question 4), it may be best to teach overt strategies to represent syntax, but if it is created by automatic processes, training, and rehearsal might be the better pedagogical approach. We revisit these pedagogical implications in the General Discussion.

The present study

We used a paradigm we called Syntactic Chunking. In each trial, the participants heard a sequence of number words and repeated it. The number of words in each stimulus (sequence) was constant, but critically, we systematically varied the stimulus grammaticality: In some conditions, the stimulus consisted of a single grammatical segment (e.g., two hundred thirty-four), and in other conditions, the stimulus included several, shorter grammatical segments (thirty-four two hundred), sometimes even fragmented almost entirely to single-word segments (hundred two-four thirty). If the participants represent the syntactic structure of each grammatical segment, repetition accuracy should be better in the conditions with longer grammatical segments than in the more fragmented conditions, because a syntactic representation may help merge the words of each segment into a single chunk in short-term memory, and this chunking should improve the participant’s memorization (Cowan, 2001; Miller, 1956). 

Critically, chunking in working memory is typically not arbitrary but depends on the specific stimulus at least in two ways: First, the specific stimulus may affect the selection of the chunk boundaries. Second, the stimulus determines the degree of compressibility, with more compressible stimuli enabling the creation of chunks that contain more data, thereby improving memorization (Mathy & Feldman, 2012). In our case, we assumed that both chunk boundaries and compressibility would be driven by the number’s syntactic structure, which allows the creation of strong associations between the words in a grammatical segment. Such associations facilitate chunking (Cowan, 2001).

A similar manipulation was used in two previous studies (Barrouillet et al., 2010; Hung et al., 2015). Similar to us, both studies manipulated the degree of grammaticality in number-word sequences; however, they also differed from the present study in critical respects. 

Barrouillet et al. used children, whereas we focused on the automatic processing of numbers in literate adults. Hung et al. used adult participants, but there were critical differences between their methodology and analyses and ours, and consequently, their study and ours tap different stages of syntactic processing. We return to these issues in the General Discussion, where we explain in detail the similarities and differences between these studies and ours, and how the 3 studies complement each other.

General methods

Participants

The participants in all experiments were adults without any reported cognitive defects. They were native speakers of Hebrew, and the experiments were run in this language. They were compensated for participation.

Screening

As screening, we examined each participant’s short-term memory using a digit span task (Friedmann & Gvion, 2002)—repeating digit sequences in increasing length. There were 5 sequences for each length from 2 to 9 digits. The participants proceeded to the next length if they repeated accurately 3 out of the 5 sequences. The span is defined as the longest sequence length in which the participant repeated 3 sequences correctly, with an additional half a point if they repeated 2 sequences of the last length. The average span of adults (age 20–30) in this task is 7.05 (SD=0.94). We included only participants with a span of 6 or higher.

Syntactic chunking task

In each trial, the participant heard a sequence of number words, said a short, fixed sentence in Hebrew (“what a nice day it is”), and then repeated the number words. Saying the sentence was aimed to “reset” the phonological short-term memory and to reduce the likelihood of phonological repetition strategies in favor of strategies based on a whole-number representation. The participants were encouraged to provide partial information about the stimulus if they did not remember it fully. Each stimulus (sequence of words) was presented only once. In case of an interruption, the trial was canceled and presented again at the end of the block.

The critical manipulation was the stimulus grammaticality. In a fully grammatical condition, each stimulus—a sequence of number words—formed a single grammatical segment (e.g., two hundred fifty-seven). In the more fragmented conditions, each stimulus consisted of several grammatical segments. For example, the stimulus fifty-seven two hundred forms two grammatical segments, fifty-seven and two hundred. Below, we use the term segment to denote a grammatically valid subsequence of the stimulus, which is also maximally valid—i.e., the segment ends when grammaticality ends. For example, the sequence of seven cannot be considered as two separate single-word segments, because these two words, in the given order, can be merged grammatically.

boost memory

Experiment 1

Method The participants were 20 adults aged 20;2–36;0 (mean=25;6, SD=3;9).

Syntactic chunking task

The experiment had 4 conditions, administered in 4 blocks. In condition A, each stimulus was a single grammatical segment, which included only the digits 2–9 and did not include the same digit twice. In conditions B, C, and D, each stimulus consisted of more, shorter grammatical segments (Fig.  1). All stimuli in a given condition had the same syntactic structure. To control for lexical effects, all 4 conditions included the same 20 sets of words; they differed only in the order of words within each stimulus.

The participant’s ability to remember the stimuli is presumably affected not only by the syntactic properties of the stimulus but also by their short-term memory capacity. Thus, the number of words in each stimulus was determined according to the participant’s digit span: Participants with span 6 heard 6-word stimuli (corresponding with 5-digit numbers), and those with span 7 heard 7-word stimuli (corresponding with 6-digit numbers).

The syntactic structure of number words in Hebrew is similar to that of English. The only difference relevant to this experiment is that whereas, in English, the phonological form of each hundreds word consists of two separate words (e.g., “three hundred”), in Hebrew each hundreds word is presumably a single lexical entry (e.g., 300=/ sloshiest/, “three hundred”). As a result, it is easier to create fully fragmented sequences of words in Hebrew than in English—we merely sorted the words according to their lexical classes—first the Ones words, then the Tens words, then the Hundreds words. For example, the number 234,567 would appear in the most-fragmented condition as thousand, four, seven, thirty, sixty, two hundred, hundred. To prevent any experimenter-originated bias (e.g., the difference between the conditions in intonation), each number word was recorded separately, and single-word recordings were merged with a 200 ms gap between words into a full auditory stimulus.

increase brain power

The participants of Experiment 1 also performed Experiment 2 (described below). Each participant was randomly assigned to one of two orders of the blocks and a random order of Experiment 1 versus Experiment 2. The specific orders were: ABCD2, DCBA2, 2ABCD, or 2DCBA. In Experiment 1, each block started with short training: The experimenter said explicitly the word order of that block, and then the participant performed 2 training trials with that block’s syntactic structure.


For more information:1950477648nn@gmail.com


You Might Also Like