Vowel Speech Recognition From Rat Electroencephalography Using Long Shortterm Memory Neural Network Part 1
Dec 27, 2023
Abstract
Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition.
There is an inseparable relationship between speech perception and memory. Speech perception is an important ability for us to be aware of audio signals, and memory is an important way we use to store and retrieve information. When we are better able to perceive speech, we are also better able to remember the information we hear.
Research shows that the relationship between speech perception and memory is bidirectional. On the one hand, poor speech perception may lead to memory impairment. This is because when we cannot hear speech clearly, we cannot accurately remember the information we heard. On the other hand, strong speech perception can improve our memory. When we can correctly perceive and understand speech, we can also remember what we hear more easily.
Therefore, we should focus on cultivating our speech perception skills to improve our memory. This can be achieved by training our listening and speech understanding skills. We can improve our speech perception and memory with activities such as listening to recordings, watching movies, and attending language classes.
In short, there is a close relationship between speech perception and memory, and we should focus on cultivating our speech perception skills to improve our memory. Through active training and practice, we can continuously improve our speech perception level and better understand and remember what we hear. It can be seen that we need to improve memory, and Cistanche deserticola can significantly improve memory, because Cistanche deserticola can also regulate the balance of neurotransmitters, such as increasing the levels of acetylcholine and growth factors. These substances are very important for memory and learning. In addition, Meat can also improve blood flow and promote oxygen delivery, which can ensure that the brain receives sufficient nutrients and energy, thereby improving brain vitality and endurance.

Click Know to improve short-term memory
Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition.
In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning.
This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods.
Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition.
The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen's κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach.
These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.
Introduction
Speech carries vast amounts of information to the brain, and it is one of the typical features of the brain to recognize and categorize the sounds of behaving animals.
Given its importance, attempts to investigate the mechanisms of speech sound recognition have been conducted for over 100 years. One of the first neurolinguistic studies of speech recognition was conducted through an observational study in the 1870s by a German neuropsychiatrist who found the crucial role of the superior temporal gyrus in speech perception, deducing that deficits in speech recognition were associated with damage to the left superior temporal gyrus [1].
It is now known that speech recognition relies predominantly on the dorsolateral temporal lobes, including the superior temporal gyrus, which contains the primary auditory cortex (A1) and anterior auditory field (AAF) [2].

Although the manner phonemes are encoded and interpreted in the brain remains controversial, it has been widely accepted that the recognition of sound is categorical. That is, discrimination is better for stimuli belonging to different phonetic categories than for stimuli belonging to the same category, even if the acoustic differences are equivalent [3, 4].
Not only humans but also animals' perceptual systems sort continuously varying sound stimuli into a set of discrete categories [5].
With the advances in neurophysiological studies, electroencephalography (EEG) has been widely used in research involving neuroscience and neural engineering [6].
The high temporal resolution and sensitivity to different functional brain states make EEG a powerful tool for investigating real-time brain activity, and there has been increasing interest in illuminating the neural basis for categorical perception. Traditionally, EEG signals are recorded non-invasively from the scalp in human studies. At the level of sound or speech perception, mismatch negativity (MMN), a component of auditory evoked potential (AEP), which is elicited by oddball sounds, is widely used to study neural correlates of categorical perception [7, 8]. Naatanen et al. found evidence for language-dependent vowel representations in the human brain [9].
Another study examined the categorical perception of lexical tones and found that across-category contrast elicited a larger MMN than within-category distinction [10]. In animal experiments, more accurate EEG signals were obtained through invasive procedures.
For instance, neural correlates of categorical perception and neural representations of various sounds have been studied using extra-cellular recording of action potential.
Striatum-projecting neurons of songbirds display categorical auditory responses and are highly sensitive to changes in note duration [11]. In addition, Kilgard et al. studied distinct neural representations of consonant and vowel sounds using intraparenchymal recording in the rat brain. Recording the multi- and single-unit responses from the inferior colliculus and A1, they suggested that the spike count encodes vowel sounds, while spike timing encodes consonant sounds [12, 13].
The effects of sound discrimination training in a rat model of autism were also investigated based on previous findings correlating neural responses to sound stimuli with sound perception ability [14].
Moreover, a recent study demonstrated that electrocorticography recorded with a multi-channel array correlates with passive exposure to a specific sound even in the auditory cortex of anesthetized rats [15].
Machine learning approaches have been used to make practical use of EEG in a wide variety of studies. Utilizing machine learning methods enables the investigation of rich information that is inherent and difficult to uncover from EEG signals [6].
Therefore, EEG-based classification can be performed in the following fields through conventional machine learning algorithms (e.g., support vector machine (SVM), k-nearest neighbors (KNN), and naïve Bayes (NB)): motor imagery, emotion recognition, mental illness detection, event-related potential (ERP) detection, and so on [16, 17].

Furthermore, in recent years, owing to the increasing advances in graphic processing units and the availability of large datasets, it has become possible to conduct EEG-based classification using various deep learning networks [6, 18, 19]. Compared with conventional machine learning methods, deep learning networks can automatically detect and extract appropriate representations from input data [20, 21].
Hence, even with insufficient prior expert knowledge, promising results can be obtained through deep learning algorithms that do not require an additional handcrafted feature extraction process [22, 23].
For example, in the fields of speech, images, and video, the results were significantly improved by applying deep learning algorithms [24–26]. However, it is not clear whether such outperforming results always accompany the EEG-based classification domain when utilizing deep learning approaches instead of traditional machine learning methods [27].
Roy et al. showed that in most of the studies (excluding four out of 102 studies), the deep learning approach led to higher performance than the traditional machine learning approach, and the highest improvement in accuracy was 35.3% [18, 28].
Furthermore, among the various fields of EEG-based classification studies, ERP classification studies are actively conducted by applying both conventional machine learning and deep learning methods.
In an early study, the traditional grand averaging method was utilized to improve the low signal-to-noise ratio (SNR), one of the limitations of EEG signals, and to obtain ERP signals.
In these studies, several ERP components were treated as feature sets for classification [29, 30]. In animal studies, ERP features such as peak amplitude and latency are also used to discriminate ERP signals [31, 32].
However, single-trial EEG-based classification has also received much attention, since it is known that EEG data at the single-trial level possess more functional and rich information than the ERP signals obtained through the traditional grand averaging method [33, 34].
Therefore, in subsequent studies, features extracted by various algorithms such as wavelet-based algorithms [35], Gaussian mixture models [36], and spatial filtering [37] for classification using conventional machine learning methods [38, 39]. However, extracting the optimal hand-crafted features from the single-trial EEG is time-consuming and labor-intensive because additional processing steps must be executed. In this context, deep learning methods can alleviate this problem by allowing end-to-end learning.
The most prevalent deep learning architecture is a convolutional neural network (CNN), followed by a recurrent neural network (RNN). The CNN is a special type of deep learning architecture widely used for single-trial EEG-based classification [6]. The CNN inputs are derived from raw or preprocessed EEG data, primarily in the following form: number of channels × number of time points in a single trial.
Moreover, considerable classification results have been demonstrated and it has been known to perform best when using spectrogram images as inputs [40– 44]. In contrast to CNN, RNN is a highly preferred architecture, especially when handling sequential data (as in natural language processing applications) because the recurrent connection of RNN learning architecture makes it possible to utilize the previous information of the network recursively as the current input data [45].
Long short-term memory (LSTM) is a kind of RNN architecture proposed by Hochreiter and Schmidhuber to overcome the exploding and vanishing gradient problems of RNN [46]. Bidirectional LSTM (BiLSTM) is a further development of LSTM that combines the forward and backward hidden layers to access both the preceding and succeeding information.
Although the BiLSTM model is much more complex and might need additional computational power, it is expected to solve the sequential modeling and classification task better than LSTM [47].
Previously we tried to classify EEG signals on a single-trial basis for three vowel sounds, /a/, /o/, and /u/, using machine learning techniques for the human brain.
After the application of
appropriate signal processing algorithms, including multivariate empirical mode decomposition (MEMD), the EEG responses were effectively classified according to each vowel sound
using a linear discriminant analysis (LDA) classifier. From the time-frequency representation
(TFR) of the EEG signals, it was also determined that the alpha band components were the
most related neural responses of vowel sound perception [48].
However, due to the low SNR
of human EEG signals, phoneme representation in the brain needs to be further assessed with
a more invasive recording technique, allowing the acquisition of more reliable EEG signals.
In addition, it is necessary to conduct further studies on the classification performance of each machine learning algorithm in classifying EEG responses to different phonemes.
The primary purpose of this study was to determine specific EEG components that might be related to speech representation in the rat brain to further illuminate brain responses to speech sound recognition.
To acquire more accurate EEG signals, epidural EEG signals in response to auditory stimuli were recorded in AAF, which has been known to play an essential role in auditory perception and categorization [2]. In addition, this study tried to discriminate different brain responses for each speech sound on a single-trial basis using LSTM networks and other conventional machine-learning techniques.
It was hypothesized that the BiLSTM network would be appropriate for classifying EEG responses to vowel stimuli and would outperform other classical classifiers because the network can perform robustly in modeling long-term dependencies of sequential data such as EEG. To the author's knowledge, LSTM networks have not been applied to the classification of EEG responses to auditory stimuli, and this is the first study to use a deep learning algorithm to analyze epidural EEG signals from AAF.

Moreover, using the deep learning algorithm, EEG responses were classified as auditory stimuli using end-to-end learning with minimally preprocessed EEG signals with no additional feature extraction methods.
For more information:1950477648nn@gmail.com






