LSTM Attention Neural-Network-Based Signal Detection For Hybrid Modulated Faster-Than-Nyquist Optical Wireless Communications
Sep 08, 2023
Abstract:
To improve the accuracy of signal recovery after transmitting over an atmospheric turbulence channel, a deep-learning-based signal detection method is proposed for a faster-than-Nyquist (FTN) hybrid modulated optical wireless communication (OWC) system. It takes advantage of the long short-term memory (LSTM) network in the recurrent neural network (RNN) to alleviate the interdependence problem of adjacent symbols. Moreover, an LSTM attention decoder is constructed by employing the attention mechanism, which can alleviate the shortcomings of conventional LSTM. The simulation results show that the bit error rate (BER) performance of the proposed LSTM attention neural network is 1 dB better than that of the backpropagation (BP) neural network and outperforms by 2.5 dB when compared with the maximum likelihood sequence estimation (MLSE) detection method.
A recurrent neural network is a type of neural network that can influence the next output by taking previous results. Such neural networks are widely used in natural language processing, speech recognition, image processing, and predicting future events. The biggest advantage of RNNs is their ability to store and utilize past information, which improves accuracy and stability.
The recurrent neural network realizes the storage and utilization of information through memory. When information is passed from the input layer to the neural network, the neuron stores the information and passes it on to the next neuron. This cyclic process becomes a ring structure that determines new outputs. The design of this recurrent neural network considers not only new inputs to the network but also the traces left by all previous inputs.
The memory of the recurrent neural network also fully reflects the human way of thinking. The human brain is also able to make reasonable judgments through memory and speculation on historical information. This is also how recurrent neural networks operate. Historical information plays a large role in the human thinking process, allowing us to correctly recall events and solve problems. This is consistent with the design principle of recurrent neural networks.
Therefore, we can consider that positive memory is crucial for both human thinking and the operation of recurrent neural networks. A good memory can help us solve problems better and gain fame and fortune. Recurrent neural networks can learn and remember all information in large amounts of data, helping people predict future trends and achieve complex tasks. In this way, the memory of recurrent neural networks can not only improve our quality of life but also bring us more commercial and scientific value. It can be seen that we need to improve our memory, and Cistanche deserticola can significantly help us improve our memory because Cistanche deserticola is a traditional Chinese medicinal material that has many unique effects, one of which is to improve memory. The efficacy of minced meat comes from a variety of active ingredients it contains, including carboxylic acid, polysaccharides, flavonoids, etc. These ingredients can promote brain health through various channels.

Click know supplements to boost memory
Keywords:
Faster-than-Nyquist; neural network; hybrid modulation; attention mechanism; optical wireless communication.
1. Introduction
Compared with traditional radio frequency (RF) communication, OWC has the advantages of high system capacity, interference immunity, good security, flexibility and fast erection, and low cost [1]. However, the transmission of optical wireless signals is affected by atmospheric turbulence, atmospheric absorption, scattering, and refraction. It is difficult for the receiver to obtain an accurate signal with the variation in refractive-index structure constant caused by random variation in atmospheric temperature and pressure. The multiband carrier-free amplitude phase (CAP) modulation technique was proposed in [2] to improve the system capacity and frequency band utilization of OWC under the Gamma–Gamma atmospheric turbulence channel. The experimental results demonstrate that the phase shift is well compensated and the inter-symbol interference (ISI) is effectively suppressed using the multi-modulus algorithm (MMA). However, to achieve ISI-free transmission in digital communication systems, the symbol rate must follow the Nyquist criterion. This limits the further improvement in the spectral efficiency of the OWC system.
In 1975, Mazo proved that higher transmission rates could be achieved using FTN technology [3]. However, FTN introduces ISI to achieve spectral efficiency, which increases the difficulty of signal detection. In recent years, studies on FTN mainly focused on model-driven detection algorithms. Detection algorithms, such as the linear detection algorithm based on minimum mean square error estimation (MMSE) [4], zero-forcing (ZF), maximum a posteriori algorithm (MAP) [5], and nonlinear detection algorithm, have unsatisfactory performance for FTN signals with a high acceleration factor and the implementation complexity is extremely high. Interestingly, the FTN signal detection algorithm performance based on deep learning (DL) outbalances the traditional model-driven algorithms [6]. In [7], DL is deployed at the transceiver of an FSO system for atmospheric turbulence compensation. This indicates that using DL in OWC can achieve superior performance with lower complexity.
To eliminate the high complexity of channel estimation caused by the lack of translation invariance of the covariance matrix, Neumann David et al. employed the MMSE and convolutional neural networks (CNNs) to compensate for this deficiency [8]. Yang Y et al. proposed a dual-selective channel fading estimation method based on deep neural networks (DNNs) for channel estimation [9]. It combines offline training and online learning to achieve high-precision super-resolution channel estimation by DNN. On the other hand, channel equalization technology is often utilized to eliminate the ISI to improve the quality of the signal [10,11]. From the perspective of DL, channel equalization can be regarded as the problem of how to recover the transmitted symbols as accurately as possible from the received symbols. The DL module can be regarded as a “black box”, which is decoded by the neural network at the receiver, thus realizing the demodulation of the transmitted signal. Liang S. et al. set up an experimental system of differential FTN precoding visible light communication using CAP modulation [12]. It optimizes the decoding algorithm by the DL method and verifies the practicability of DL in the FTN-FSO system.
As an improvement to RNN, the LSTM network was proposed by Hochreiter and Schmidhuber in 1997 and has been utilized in many fields [13]. A convolutional long short-term deep neural network (CLDNN) was introduced in [14], which exploits the complementary nature of CNN and LSTM to combine the architectures of CNN and LSTM into deep neural networks. The authors of [15] proposed an RNN called bi-directional long short-term memory (Bi-LSTM) to characterize the feature of ISI introduced in FTN signaling. Moreover, it describes a “mismatch SNR” strategy for building the training dataset that can effectively help prevent overfitting. On the other hand, various DL approaches have been used to address the problems in wireless communication and achieve valuable conclusions. Biao Gong has studied the demodulation technology of CNN in orbital angular momentum (OAM) atmospheric laser communications [16].
Siying Mao shows that RNN and LSTM have some advantages in decoding aliased signals by experiments in [17]. To further improve the network performance and select the most discriminative features, an attention mechanism is introduced into the network to explore the dependencies between features. The authors of [18] proposed a dual attention network (DANet) with a self-attention mechanism to enhance the discriminant of feature representations for scene segmentation, in which a position attention module is proposed to learn the spatial interdependencies of features and a channel attention module is designed to model channel interdependencies. It significantly improves the segmentation by modeling rich contextual dependencies over local features. In the 2022 IEEE ICAIT conference, we proposed a BP neural network to promote the BER performance of signal detection in an atmospheric channel [19]. A BP neural network has the advantages of high self-learning ability and self-adaptive ability. It can learn the mapping rules between input and output data during the training process and adaptively memorize such rules by using the network weights. Therefore, the neural network is less affected by the acceleration factor and roll-off factor, which ensures the spectrum utilization of the system at an FTN rate.
However, the forgetting of sequence information exists in BP neural networks. Therefore, we employ the LSTM neural network and attention mechanism to overcome the problem of forgetting sequence information. Nowadays, the attention mechanism has become a common data processing method in the DL field and is widely used in various DL tasks, such as natural language processing, image recognition, and speech recognition. Assembling features by assigning larger weights to some ‘significant’ features not only reduces the parameters of the network but also improves the discriminative power of the features. Therefore, the attention mechanism is introduced into the LSTM network to build an LSTM attention decoder for the signal detection of a pulse position modulation (PPM) and phase shift keying (QPSK) hybrid modulated FTN OWC system to improve the system performance while ensuring spectrum efficiency.

2. System Model
Traditionally, intensity modulation/direct detection based on an on–off keying (OOK) scheme is widely accepted in OWC owing to its easy implementation and lower cost [20]. Considering the low BER performance and spectrum efficiency of OOK, PPM has been considered to be used in OWC communications. Compared with OOK, energy utilization is greatly increased. In addition, modulated QPSK has the characteristics of high spectrum utilization and strong anti-interference [21]. Therefore, combining the PPM and QPSK can improve the data transmission rate and the system reliability [22,23].
Figure 1 shows the schematic of the 4PPM and QPSK hybrid modulated OWC system with FTN technology. The user data after the Gray encoder are firstly mapped into 4PPM and QPSK, respectively. Thereafter, the QPSK signal is loaded into the time slot of the 4PPM signal to form the 4PPM–QPSK hybrid modulated signal. Afterward, the formatted 4PPMQPSK signal is sent to the FTN shaping filter for FTN signal forming. Subsequently, after digital-to-analog conversion (DAC), the data are launched into the atmospheric channel. At the receiver end, the optical signal transmitted over the atmospheric channel is first detected by a photodiode (PD) and then sent for analog-to-digital converting (ADC), matched filtering, and sampling. Thereafter, the signal is sent to the DL module for data recovery.


3. LSTM Attention Decoder
RNN is an important branch of DL that can be used not only for the processing of time series data but also to focus on the timing of the feature model. In addition, it is useful for processing sequence data where the front input affects the behind output [24]. However, there are problems of gradient disappearance and gradient explosion with the expansion of the timeline in traditional RNN. The gradient disappearance occurs because of the Sigmoid function. The Sigmoid function is usually employed in the output layer, but the derivative of this function ranges from 0 to 0.25. When the BP algorithm is utilized to calculate the gradient, the gradient of each layer will be reduced to 1/4 of the original. If there are many network layers, the gradient is going to become small. The value of the initial network weight needs to be set larger than 1 to avoid this phenomenon, but it will lead to gradient explosion [25]. So, it has great limitations in the prediction of long-time series data. Figure 2 shows the diagram of the RNN network unfolded along the timeline.

To solve the problems existing in RNN, the LSTM network is proposed to solve the ubiquitous long-term dependence problem in the network, which has been proven to be effective in solving the gradient disappearance and gradient explosion problems caused by RNN [24]. The biggest difference between LSTM and RNN is that RNN has only one state inside a single recurrent structure, while LSTM has four states and each structure is composed of an input gate, forget gate, output gate, and cell state. The diagram of the LSTM network is shown in Figure 3. ⊗ denotes the multiplication of vector elements and ⊕ denotes the addition of vector elements. Both the input and output gates open and propagate signals only when previous information is needed. In this way, the previous information can be saved selectively. The function of the forget gate is to receive the error from the memory unit and “forget” the value stored in the memory unit when needed, to achieve the control of the network weights.

Both RNN and LSTM networks are designed to handle the problem of long time series. However, the network will forget the previous useful information because of the existence of a forget gate when LSTM deals with the gradient explosion problem. This deteriorates the effect of long-sequence training and the system performance. Fortunately, the attention mechanism is a great solution to this problem [26]. Variants of attention mechanisms include multi-head attention, hard attention, structured attention, and key–value pair attention [27]. Multi-head attention utilizes multiple queries to calculate in parallel to select multiple pieces of information from the input. Each attention focuses on a different part of the input. Hard attention can be implemented in two ways. One is to select the input information with the highest probability. Another is to randomly sample the distribution of attention. Structured attention involves picking out task-relevant information from input. Key–value pair attention employs a key–value pair format to represent input information, where “key” is utilized to calculate the attention distribution and “value” is utilized to generate the selected information. Considering its excellent performance, the key–value pair attention mechanism is employed in our proposal to perform the LSTM attention decoder, and it can be utilized to process the received FTN hybrid signals. The diagram of the LSTM attention decoder is shown in Figure 4.


4. Simulation Analysis
The size of the training or test dataset depends on the complexity of the system and the DL algorithm. Using a small dataset may cause poor detection performance because the model would be incapable of fully learning the diverse characteristics of the system. Further, using a large dataset may result in increased computational complexity [30]. Thus, several simulations are conducted to determine the suitable dataset size and the parameters that could offer the best BER performance. It should be noted that the accuracy of the neural network is affected by the DL algorithm itself, which plays an important role in solving some nonlinear problems. Therefore, its performance and robustness need to be evaluated. Without a loss of generality, some common parameters are taken into consideration. The parameters used in the simulation are listed in Table 1.

Table 2 shows the accuracy of the network under different learning rates. A validated system needs an appropriate learning rate. If the learning rate is too large, the network cannot converge, while if it is too small, the network will converge very slowly or be unable to finish learning. Moreover, the network may change from underfitting to overfitting as the learning rate increases [31]. It is evident from the table that 0.002 has the best performance.

The selection of hidden layers is another key point. A low or high number of hidden layers leads to the phenomena of underfitting or overfitting [31]. The relationship between the number of hidden layers and accuracy is shown in Table 4. The accuracy increases gradually with the number of hidden layers and declines when it reaches a certain value. In addition, studies in [32,33] found that the increasing number of hidden layers results in a significant increase in computational complexity and overfitting. The causes of overfitting can be divided into three categories [34]. The first is a small dataset of training samples that cannot reflect the overall possible situations.

This will lead to a less accurate prediction of the trained network. Therefore, the training dataset should cover all types of data as much as possible. The second is a network that cannot accurately estimate the relationship between input and output because of the excessive interference of training data. The third is the high complexity of the network. Under the circumstances, it should process many parameters to enable the network to accurately fit every data in the training dataset. As a result, the trained network cannot generalize to the test dataset. Therefore, the appropriate number of hidden layers is crucial to the system's performance. The simulation experimental results in Table 5 show that the system has the best detection performance when the number of hidden layers is 8.

The comparison of the accuracy of the LSTM network and the LSTM attention network is shown in Table 5. The accuracy of the LSTM attention network is significantly higher than that of the LSTM network.
It is well known that rain, snow, sleet, fog, haze, pollution, and so on are atmospheric factors that impact the laser beams. Their presence causes reflection, refraction, scattering, and attenuation of optical signals. It has been proven that atmospheric turbulence follows the Gamma–Gamma distribution, and weak, moderate, and strong turbulence intensity can be expressed by the refractive-index structure constant of C 2 n = 2 × 10−18, C 2 n = 2 × 10−15, and C 2 n = 2 × 10−12, respectively [35] The curves of different atmospheric turbulence intensities versus BER are shown in Figure 6, where the roll-off factor is 0.6, τ = 0.8, and the transmission distance is 500 m. It is evident from the figure that the BER performance is gradually improving with the decrease in turbulence intensity. When BER = 3.8 × 10−3, the BER performance of weak turbulence is about 2 dB and 5 dB better than that of moderate and strong turbulence, respectively.

Figure 7 shows the influence of the roll-off factor of the FTN shaping filter on BER with a different decoder, where the acceleration factor is 0.8. As shown in Figure 7a, when BER = 10−4, the LSTM attention decoder improves the BER performance by about 1 dB compared with the BP algorithm. Figure 7b shows that the LSTM attention decoder improves the BER performance by about 2.5 dB compared with the MLSE algorithm when BER = 10−4. Therefore, LSTM attention is beneficial to improve the BER performance compared with the traditional decoder.
Figure 8 shows the impact of the acceleration factor on the system BER performance. When the BER is 10−4 and the acceleration factor decreases from 1 to 0.9 and 0.8, the BER performance decreases by about 2 dB and 4 dB, respectively. When the BER is 10−3 and the acceleration factor decreases from 1 to 0.9 and 0.8, the BER performance declines by about 1 dB and 4.5 dB, respectively. It can be concluded from the figure that the BER curves decrease rapidly as the acceleration factor decreases. However, under the premise of improving the spectrum efficiency, the system can still ensure good communication quality when the acceleration factor is 0.8. Thus, the proposal is beneficial to improve the performance of the system.

To further illustrate the advantages of the proposed method, as shown in Table 6, the running time of the LSTM attention and BP network are compared.

The time complexity is tied to hardware execution and includes the number of operations needed, the number of elements to process, and the path length needed to complete an operation. The simulation experiments are implemented by Matlab 2018a and Pycharm 2021.3.2. An NVIDIA GeForce RTX 3050 Laptop GPU is used as the test platform. In the training process, 50,000 data are randomly generated, of which 80% is used as the training dataset and the remaining 20% is used as the test dataset. The LSTM attention network outperforms the BP network. This is because the convergence speed of the BP neural network is slow.
6. Conclusions
In this paper, an LSTM attention decoder is proposed for signal detection of hybrid the modulated 4PPM–QPSK–FTN OWC system. The LSTM attention network can alleviate the problems of gradient disappearance, gradient explosion, and interdependence between adjacent symbols. The experimental simulation shows that our proposal has outstanding signal detection performance for hybrid-modulated FTN signals. The received signal can be accurately predicted and quickly and correctly decoded. Hence, the scheme can effectively improve the BER performance on the premise of ensuring spectrum efficiency.
Author Contributions:
Conceptualization, M.C., R.Y., J.X., and H.W.; Methodology, M.C., H.W., and R.Y.; Validation, M.C., R.Y., and J.X.; Investigation, J.X., and K.J.; Resources, R.Y., and J.X.; Data curation, J.X.; Writing—original draft preparation, J.X., and R.Y.; writing—review and editing, M.C., H.W., R.Y. and K.J.; Visualization, R.Y., and H.W.; Supervision, M.C., and H.W.; Project administration, M.C. All authors have read and agreed to the published version of the manuscript.
Funding:
This research was funded by the NSFC Program (62265010, 61875080, 62261033) and the Natural Science Foundation of Gansu Province, China (20JR5RA472).
Institutional Review Board Statement:
Not applicable.
Informed Consent Statement:
Not applicable.
Data Availability Statement:
The study did not report any data.

Acknowledgments:
We gratefully acknowledge the assistance of Rui Wang in preparing and debugging the program. We also thank Yan Qiu and Hongtao Zhou for their help with the English writing and details.
Conflicts of Interest:
The authors declare no conflict of interest.
References
1. Ke, X.Z.; Jing, Y.K. Far-field laser spot image detection for use under atmospheric turbulence. Opt. Eng. 2020, 59, 016103. [CrossRef]
2. Wu, P.; Ke, X.; Li, M.; Zhang, Q. Performance and equilibrium experiment of a multiband CAP modulation system in wireless optical communication. Opt. Commun. 2019, 434, 128–135. [CrossRef]
3. Mazo, J. Faster-than-nyquist signaling. Bell Syst. Tech. J. 1975, 54, 1451–1462. [CrossRef]
4. Bahl, L.R.; Cocke, J.; Jelinek, F.; Raviv, J. Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans. Inf. Theory 1974, 20, 284–287. [CrossRef]
5. Matar, M.O.; Jana, M.; Mitra, J.; Lampe, L.; Lis, M.; Soc, I.C. A Turbo Maximum-a-Posteriori Equalizer for Faster-than-Nyquist Applications. In Proceedings of the 28th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), Fayetteville, AR, USA, 3–6 May 2020; pp. 167–171.
6. Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S.; de Dinechin, B.D. Novel Arithmetics in Deep Neural Networks Signal Processing for Autonomous Driving: Challenges and Opportunities. IEEE Signal Process. Mag. 2021, 38, 97–110. [CrossRef]
7. Amirabadi, M.A.; Kahaei, M.H.; Nezamalhosseni, S.A. Low complexity deep learning algorithms for compensating atmospheric turbulence in the free space optical communication system. IET Optoelectron. 2022, 16, 93–105. [CrossRef]
8. Li, S.; Yuan, W.; Yuan, J.; Bai, B.; Ng, D.W.K.; Hanzo, L. Time-domain vs. frequency-domain equalization for FTN signaling. IEEE Trans. Veh. Technol. 2020, 69, 9174–9179. [CrossRef]
9. Yang, Y.; Gao, F.; Ma, X.; Zhang, S. Deep learning-based channel estimation for doubly selective fading channels. IEEE Access 2019, 7, 36579–36589. [CrossRef] 10. Huang, H.; Yang, J.; Huang, H.; Song, Y.; Gui, G. Deep learning for super-resolution channel estimation and DOA estimation based massive MIMO system. IEEE Trans. Veh. Technol. 2018, 67, 8549–8560. [CrossRef]
11. Hu, F.; Holguin-Lerma, J.A.; Mao, Y.; Zou, P.; Shen, C.; Ng, T.K.; Ooi, B.S.; Chi, N. Demonstration of a low-complexity memory polynomial-aided neural network equalizer for CAP visible-light communication with superluminescent diode. Opto-Electron. Adv. 2020, 3, 4–14. [CrossRef]
12. Liang, S.; Jiang, Z.; Qiao, L.; Lu, X.; Chi, N. Faster-than-Nyquist pre-coded CAP modulation visible light communication system based on nonlinear weighted look-up table pre-distortion. IEEE Photonics J. 2018, 10, 7900709. [CrossRef]
13. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
14. Sainath, T.N.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 19–24 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 4580–4584.
15. Lai, S.; Li, M. Recurrent Neural Network Assisted Equalization for FTN Signaling. In Proceedings of the IEEE International Conference on Communications (IEEE ICC)/Workshop on NOMA for 5G and Beyond, Electr Network, Dublin, Ireland, 7–11 June 2020.
For more information:1950477648nn@gmail.com






