Aircraft Bleed Air System Fault Prediction based on Encoder-Decoder with Attention Mechanism

Highlights Abstract ▪ A novel fault prediction method for the aircraft bleed air system is proposed by combining the DSTP-ED prediction model and the EWMA control chart. ▪ The DSTP-ED model incorporates attention mechanisms and has more accurate prediction results compared to other models. ▪ The EWMA control chart can effectively identify impending bleed air system failures. ▪ The proposed method is validated with real airline QAR data. The engine bleed air system (BAS) is one of the important systems for civil aircraft, and fault prediction of BAS is necessary to improve aircraft safety and the operator's profit. A dual-stage two-phase attention-based encoder-decoder (DSTP-ED) prediction model is proposed for BAS normal state estimation. Unlike traditional ED networks, the DSTP-ED combines spatial and temporal attention to better capture the spatiotemporal relationships to achieve higher prediction accuracy. Five data-driven algorithms, autoregressive integrated moving average (ARIMA), support vector regression (SVR), long short-term memory (LSTM), ED, and DSTP-ED, are applied to build prediction models for BAS. The comparison experiments show that the DSTP-ED model outperforms the other four data-driven models. An exponentially weighted moving average (EWMA) control chart is used as the evaluation criterion for the BAS failure warning. An empirical study based on Quick Access Recorder (QAR) data from Airbus A320 series aircraft demonstrates that the proposed method can effectively predict failures.


Introduction
The engine bleed air system (BAS) is a type of aircraft air source system that provides compressed air with regulated pressure and temperature for user systems (engine starting, air conditioning, wing ice protection, hydraulic reservoir pressurization, and pressurized water). BAS failure has a greater impact on the flight of the aircraft, which may lead to the abnormality of the cabin pressurization system, the performance degradation of the air conditioning system, etc. If the double-engine BAS failure occurs, it will often lead to homeward flight. Advanced failure prediction technology is urgently needed to reduce aircraft operation and maintenance costs due to performance degradation and system failures. Predictive maintenance with early identification of BAS malfunctions can reduce the number of aircraft stops and reduce airline operating costs.
BAS failure prediction can be achieved by a modeling approach based on systematic principles and a data-driven approach. But building a sufficiently accurate analytical model based on system principles is challenging, especially for nonlinear and complex systems. With the increasing availability of system monitoring data, data-based techniques have become an essential complement to model-based methods for fault prediction and diagnosis [10,22,40]. The data-driven approach simply uses the collected operational data to derive the model without requiring much knowledge of the system degradation mechanisms. Currently, data-driven fault prediction methods have been widely used in industrial systems and are classified into statistical methods (e.g., autoregressive models [36], statistical process techniques [21,33], and mathematical morphology spectrum entropy [39], etc.) and machine learning (such as neural networks [26,41], support vector machines [34,37], and fuzzy methods [8,35]). Aircraft have accumulated a large amount of sensor monitoring data during the operation, which can be classified as FDR (Flight Data Recorder), QAR (Quick Access Recorder), and ACARS (Aircraft Communications Addressing and Reporting System) data [30], according to the type of records. These data can reflect the operational status of aircraft systems and can be used to build models for condition monitoring, fault detection, etc. In the field of commercial aviation, predictive analytics for the exponentially growing number of operations and maintenance data generated on aircraft is full of promise [13,38,42].
However, there are limited studies on BAS in the published literature. Shang et al. [27,28] developed a fault detection method for BAS's temperature sensors and valve actuators.
Abdelrahman et al. [1] used backpropagation algorithms to model the faults of the most important bleed air system valves of the B-737 aircraft under desert conditions. Peltier et al. [23,24] conducted an experimental investigation of the performance of different BASs. These studies focused on design improvements and component-specific troubleshooting while ignoring BAS operational data. Su et al. [31] established a risk warning model for BAS based on QAR data, but the adopted method did not consider the time-series relationship between the data, and there was a risk of masking failures.  [2] is a well-known method for temporal predictive analysis, with the advantage of a simple model that requires only endogenous variables and no other exogenous variables. However, ARIMA is only applicable to smooth time-series data, which can only capture linear relationships by nature, and the data used must be autocorrelated. Support vector regression (SVR) [18] is an application of support vector machines for regression problems.
SVR is more concerned with the spatial correlation of data, which is effective in solving problems with high-dimensional features, but ignores the temporal correlation between data. The recurrent neural network (RNN) [9] is a class of neural networks for processing time-series data, where the current output of the series is related to the previous output. It is more suitable for short-term memory-type tasks and cannot address long-term dependency problems, and the gradient disappearance becomes more severe as the network complexity increases. Long shortterm memory (LSTM) solves the gradient disappearance and explosion problem during long sequence training and is widely used in machine translation, speech recognition, and image processing [3,11,15]. The encoder-decoder (ED) framework was first introduced by Cho et al. [6] in sequence-to-sequence recurrent neural networks and is popular in machine translation [32]. However, since the context vector C of the encoderdecoder network is fixed, the model still does not work well for longer sequences [5]. To address this problem, Bahdanau et al. [3] proposed the attention mechanism, which has now been applied to natural language processing, time series prediction, etc [4,7,12]. Qing et al. [25] and Liang et al. [17] used a dualstage attention-based RNN for time series forecasting, which solved the problem of long-term dependence but ignored the spatial association between the driving and target sequences. Therefore, this paper adopts a dual-stage two-phase attention-based encoder-decoder model (DSTP-ED) for BAS fault prediction [14,19], which can better capture the spatialtemporal relationships between data and performs well on various datasets. The "dual-stage" refers to using spatial and temporal attention mechanisms to obtain the spatiotemporal correlation between the driving and target sequences. The "twophase" means that the spatial attention mechanism is composed of two attention modules for capturing the spatial correlation of the driving sequence and the target sequence. The main contributions of this paper are as follows: (1) A data-driven failure prediction method is developed to monitor the condition of the BAS and identify its impending failures. The fault prediction method consists of a state prediction model based on DSTP-ED and an anomaly criterion based on EWMA (Exponentially Weighted Moving Average) control chart. This organic combination of deep learning and statistical process control can effectively identify anomalies in BAS and detect faults in advance.
(2) The DSTP-ED model integrates spatial attention and temporal attention mechanisms that can adaptively select the most relevant input features as well as better capture long-term dependencies, effectively improving the accuracy of BAS state prediction. Compared with the four algorithms, ARIMA, SVR, LSTM, and ED, DSTP-ED has the optimal model performance.
(3) Fault prediction results may be disturbed due to changes in the operating environment and sensor noise. The EWMA control chart incorporates historical information from previous observations, which can well eliminate these anomalous effects and reduce false alarms. Real aircraft operational data verify that EWMA detects BAS anomalies earlier and with fewer false alarms than other statistical methods.
The remaining sections of this paper are organized as follows: Section 2 presents the fault prediction theory in detail, including the DSTP-ED model, the training procedure, the fault diagnosis criteria, and the fault prediction process. Section 3 provides an application case of BAS fault prediction based on QAR data and presents the proposed method's comparative analysis. The conclusion is shown in the final section.

Fault prediction theory
This section describes the framework for fault prediction and the detailed algorithms, specifically the DSTP-ED prediction model and the fault monitoring methods. The fault prediction framework of BAS is shown in Fig. 1

Prediction problem statements
To accurately determine the abnormal state of BAS in time, the state values of BAS feature variables are estimated based on normal historical operating data, and the following time series model is established: where ̂+ 1 ∈ denotes the predicted value of the target variable for the next moment. T is the size of the time window. other related variables X in the past as input to obtain the predicted value ̂+ 1 of the target variable.

DSTP-ED model
The DSTP-ED model consists of an encoder-decoder and attention mechanisms, and both the encoder and decoder adopt LSTM. In the encoder, a two-phase attention mechanism is utilized to obtain the spatial association between the target and driving sequences. In the decoder, a temporal attention mechanism is employed to improve the response of the decoder to the long-term encoding vector.

Encoder with two-phase attention
Set the input sequence = ( 1 , . . . , ) ∈ n×T , where n is the number of driving sequences, and T is the time window size.
LSTM units are used for encoding to extract the feature expressions of the input sequence. Fig. 2 illustrates the structure of the encoder. In Fig. 2, the light green box is the LSTM neuron, while RNN and gated recurrent unit [6] can also be used as encoder neurons.  (1) First phase attention module This attention module belongs to the spatial attention mechanism used to obtain the spatial correlation between the driving sequences. Given the input driving sequence = ( 1 , . . . , ) ∈ T of the kth variable, the softmax function is adopted to normalize the attention weights of all variables. The first phase attention mechanism is constructed as follows: where ℎ −1 ∈ m and −1 ∈ m are the previous hidden state and cell state of LSTM in the first phase attention mechanism of the encoder, respectively, and m is the size of the hidden state.
∈ T×2m , ∈ T×T , , ∈ T are the network parameters that can be trained. , are the attention score and weight of the kth input driving sequence at time t, respectively. With the first phase attention weights, the input sequence is redefined as: Then, the hidden state ℎ is updated to: where fe is the LSTM unit, ℎ can be calculated according to Eq. (2) -(6), ht-1 needs to be replaced byℎ −1 , xt bỹ.
(2) Second phase attention module This attention module also belongs to the spatial attention mechanism, which is used to obtain the spatial correlation between the driving and the target sequences. The output ̃ of the first phase attention mechanism is connected with the target sequences at the same moment to obtain = [̃; ] ∈ 2×T , which is employed as an input to the second phase attention mechanism. The second phase attention weights are calculated as follows: where ℎ −1 ∈ p and −1 ∈ p are the previous hidden state and cell state of LSTM in the second phase attention module of the encoder, respectively, and p is the size of the hidden state of this attention module. , are the attention score and weight of the matrix at time t, respectively. ∈ T×2p , ∈ T×T and , ∈ T are the trainable parameters. The output of the second phase attention mechanism is: Then, the hidden state ℎ is updated as: where fs is the LSTM unit, ℎ can be calculated according to Eq. (2) -(6), ht-1 needs to be replaced by ℎ −1 , xt by ̃.

Decoder with temporal attention
The decoder also adopts LSTM neurons, which use abstract features of the original input sequence obtained from the encoder to predict the output ̂+ 1 . A temporal attention mechanism is employed to improve the decoder's response to the long-time sequence encoding vectors by adaptively selecting the hidden states most relevant to the predicted target values. The specific temporal attention mechanism is calculated as the following: where −1 ∈ q and −1 ∈ q are the previous hidden state and cell state of LSTM in the decoder, respectively, and q is the size of the hidden state in the temporal attention module. ℎ is the hidden state of the ith encoder of the second phase attention module. , are the attention score and weight of the hidden state ℎ , respectively. The attention weight indicates the significance of the ith encoder hidden state for the prediction.
, ∈ q , ∈ p×2q and ∈ q×q are the trainable parameters. Summing over the different encoder-weighted hidden states produces: Where ℎ denotes the weighted hidden state. is the context vector that represents the fusion information of hidden states of the encoder. And is different at each time step.
Combining with the target series Y gives: where ̃∈ p+1 and ̃∈ are the parameters that map the concatenation to the decoder input. ̃− 1 is employed to update the hidden state of the decoder, as follows: where fd is the LSTM cell, can be calculated according to Eq.
Finally, the context vector is connected to the decoder hidden state as the new hidden state, and the linear function generates the prediction result: The parameters ∈ q×(q+p) and ∈ q map the concatenation to the size of the decoder hidden state, and the parameters ∈ q and ′ ∈ are the weights and deviations of the linear function. The complete structure of the DSTP-ED model is depicted in Fig. 4

2.3.Training procedure
The proposed DSTP-ED model is smooth and differentiable, so the standard backpropagation algorithm is adopted to train the model. The learning rate is a significant hyperparameter to be tuned while training the model. A novel cyclic learning rate is used in the training process, allowing the learning rate to vary cyclically between reasonable boundary values instead of decreasing monotonically [29]. Training with a cyclic learning rate rather than a constant value improves training accuracy without tuning and with fewer iterations. The batch size is set as 128. The minibatch stochastic gradient descent (SGD) [16] is employed to minimize the mean squared error (MSE) between the true value of the target variable and the predicted value ̂: where N is the sample size.

2.4.Residuals-based monitoring method
The The statistic ( ) of the EWMA control chart is expressed as: where is the predicted residual of the DSTP-ED model at time t. ∈ (0,1) is the smoothing parameter. The smaller the λ, the smaller the bias can be detected, λ is generally set at 0.2-0.3 [20].
The initial value and standard deviation of the variable ( ) are: where 0 , are the mean and standard deviation of the normal data prediction residuals, respectively, and is the sample size.
When ( ) exceeds the control limits, an anomaly is considered to occur. The central and upper/lower control limits (CL, UCL, LCL) of the EWMA control chart are shown below: where L determines the control limit range, which is usually set to 3 [20]. The anomaly detection effectiveness of the EWMA control chart depends on the standard deviation δ, L, and λ. If the process mean is not drifting too fast, EWMA with suitable λ would be a good ahead predictor.

2.5.Fault prediction process
The complete fault prediction process includes three modules: data preparation, prediction model training and online warning, as shown in Fig. 5.
Original data • Online warning In the online warning stage, the test set data (new run data) are processed and input to the trained DSTP-ED model to obtain the prediction error of target variables. The EWMA control chart is employed to determine whether the prediction error exceeds the control limit and to alert flights that exceed the limit.
Thus, BAS abnormalities are detected in advance and fault prediction is realized.

3.1.Case study description
This study selects the BAS of the Airbus 320 series aircraft as an application case. A schematic of the BAS is depicted in Fig.   6.

3.2.Data description and preprocessing
The experimental data are the real monitoring data (QAR data) generated during the operation of an Airbus A320 series aircraft.  Table 1.
where t is the point of missing data, u and v are the points without missing data before and after t, respectively; yu and yv are the values of u and v points; L(t) is the result after interpolation.

3.3.Modeling variable selection
The main characteristic variables of BAS include bleed air pressure and bleed air temperature, and the influence brought by the external environment should also be considered. Based on the system operation mechanism and engineering experience, the following parameters are chosen as the feature variables of BAS, as shown in Table 2.

3.5.DSTP-ED model validation
Since the difference in the order of magnitude of the original data will introduce errors to the prediction model, which will easily cause the neurons to saturate and decrease network expression, the eigenvariables are normalized and reduced to prediction results of BAT perform best at T=40, which is illustrated in Fig. 9. It can be seen that the model performance is worse when the time window is too long or too short. BAP and BAT are best predicted when T is fixed at m = p = q = 16, as described in Fig. 10.  Table 3 -Table 7.   The prediction results and prediction errors of the DSTP-ED and the four baseline methods are shown in Fig. 11 and Błąd! Nie można odnaleźć źródła odwołania., respectively. As Fig.   11 shows    Table 8.
where is the true value of the target variable and ̂ is the predicted value at time t.  Table 8 indicates that the DSTP-ED outperforms the other   models with the optimal RMSE, MAE,  The comparative analysis shows that the DSTP-ED model is more effective than other classical methods for predicting BAS data.

3.6.Monitoring results
The EWMA control chart is used to detect anomalies of the BAP and BAT prediction errors, which are measured by the relative error (RE). The expression of Δyt in Eq. (22) is given by: where yt is the true value and ̂ is the predicted value at time t.
By comparing the RE of the DSTP-ED model for healthy BAS and faulty BAS, we can find that the RE of faulty BAS drifts before failure occurs, and Błąd! Nie można odnaleźć źródła odwołania. illustrates such a pattern. In Błąd! Nie można odnaleźć źródła odwołania., the RE of BAP and BAT of abnormal data gradually deviates from the normal range, reaches a peak, and then returns to the normal range after maintenance, so the RE of the DSTP-ED model can be used to identify the potential BAS faults.
(a) REs of BAP.
(b) REs of BAT.   Table 1, it can be seen that the maintenance was performed on July 8th and August 4th and the TCT and PRV were replaced, respectively. The maintenance on July 8th corresponds to the end of the 88th flight and the record on August 4th corresponds to the end of the 167th flight. Since operators can dispatch aircraft with faulty BAS as specified in the Minimum Equipment List, there is the possibility that the crew reported the corresponding failure before the maintenance was recorded.
We have reviewed the crew fault reports and the current system warnings and found that a BAS alert appeared in the warning system on flight 167 and a BAS anomaly was reported by the crew on flight 86. Therefore, combining the maintenance records, crew fault reports, and the current warning system, it can be concluded that the left BAS failures are all successfully identified a dozen flights (about two days) in advance.
Additionally, according to Fig. 14, it can also be concluded that the repair on July 8th was caused by a BAT anomaly and the August 4th repair was due to a BAP anomaly, which can help the maintenance personnel locate the defective parts more quickly.

3.7.Comparison and discussion
To validate the effectiveness of the EWMA control chart for BAS anomaly detection, two other statistical methods, mean control chart (Xbar) and box plot, are selected for comparison.
Xbar is the most common and basic control chart used to analyze the trend of changes in the center of the production process by determining whether the average value of the production process is in the required state of control. The box plot is a statistical method that shows the dispersion of data and is mainly used to reflect the characteristics of the original data distribution. It detects the data by five points: median, upper quartile, lower quartile, upper limit, and lower limit, and labels the information that exceeds the limits. The results of BAS anomaly detection for the Xbar control chart and box plot are given in Błąd! Nie można odnaleźć źródła odwołania. and Błąd! Nie można odnaleźć źródła odwołania., respectively.   Table 9.

Conclusion
This