Fault Diagnosis of Suspension System Based on Spectrogram Image and Vision Transformer

▪ This study suggests using image vision for dynamic signal (vibration) pattern recognition. ▪ By using spectrogram images as input, the model captures both temporal and frequency components for precise fault identification. ▪ This study introduces a deep learning model for diagnosing multiple faults in automobile suspension systems, addressing a gap in suspension system fault diagnosis. The suspension system in an automobile is essential for comfort and control. Implementing a monitoring system is crucial to ensure proper function, prevent accidents, maintain performance, and reduce both downtime and costs. Traditionally, diagnosing faults in suspension systems has relied on specialized setups and vibration analysis. The conventional approach typically involves either wavelet analysis or a machine learning approach. While these methods are effective, they often demand specialized expertise and time consumable. Alternatively, using deep learning for suspension system fault diagnosis enables faster and more precise real-time fault detection. This study explores the use of vision transformers as an innovative approach to fault diagnosis in suspension systems, utilizing spectrogram images. The process involves extracting spectrogram images from vibration signals, which serve as inputs for the vision transformer model. The test results demonstrate that the proposed fault diagnosis system achieves an impressive accuracy rate of 98.12% in identifying faults.


Introduction
Urbanization has transformed transportation into an integral aspect of human life, where people depend on diverse modes of commuting.Cars are the preferred choice among these modes owing to their safety, comfort, and convenience.In the modern era, conventional cars possess the capacity to reach or even exceed speeds of 180 km/h However, when a vehicle is not adequately maintained, traveling at high speeds can result in severe consequences, as the majority of reported road accidents occur because of a lack of control over direction or braking.The suspension system plays a crucial role in maintaining lateral and longitudinal stability because it is interconnected with the steering system, which ensures that the vehicle maintains consistent contact and pressure on the road (1).Any failure in the components of the suspension system can directly affect the performance of the braking and steering systems, leading to a 12% increase in braking time and a 30% increase in braking distance.Consequently, these factors can contribute significantly to the occurrence of potentially fatal accidents.Additionally, when cars are in motion, they are subjected to various forces, such as acceleration, braking, road disturbances, and centrifugal force during cornering.These forces can cause discomfort to car occupants and reduce the overall maneuverability of a vehicle (2).A passive suspension system, consisting of springs and dampers, helps suppress and dissipate these unwanted forces, converting them into heat.Furthermore, the extended usage of a vehicle can result in gradual deterioration of suspension components, ultimately leading to suspension failure.Therefore, it is of utmost importance to maintain the functionality of suspension systems to guarantee a safe and comfortable driving experience.Although semiactive and active suspension systems offer dependable realtime monitoring, their high cost and the need for additional control systems and actuators make them impractical for most automobiles (3)(4)(5).Consequently, passive suspension systems         • Limited research on the identification of multiple suspension system faults.
• Lack of studies focused on identifying faults in bushes and tie rods.These spectrogram images were utilized as input for the Vision Transformer algorithm.
• Hyperparameter tuning was conducted to optimize the performance of the Vision Transformer networks.
This process involved adjusting various parameters to find the most suitable configuration for the classifier.
• Based on the outcomes of the hyperparameter tuning, appropriate parameters were recommended for the Vision Transformer classifier.These optimized parameters were then utilized in the fault diagnosis system specifically designed for detecting suspension faults.
The overall process of suspension fault diagnosis using the Vision Transformer is depicted in Figure 2.

Experimental Studies
The following section provides detailed information on the experimental studies conducted in three categories:

Experimental Setup
This study used the suspension system of a commercially

Data Acquisition
The

Faults in Suspension System
A suspension system is crucial for ensuring the safety and comfort of vehicle occupants.It consists of various components, such as the strut (comprising a damper and coil spring), lower arm, tie rod, strut mount, and knuckle.
Throughout the operational lifespan of a suspension system, it is exposed to dynamic loading conditions.Factors such as prolonged usage, rough road conditions, gradual wear and tear of internal components, and the impact of moisture and corrosion can contribute to faults in individual components of the suspension system.The presence of faults in a suspension system can significantly affect its performance, reliability, and longevity.Figure 4

Vision Transformers
The

Image Generation and Image Processing
In

Result and Discussion
In this section, a thorough investigation is carried out to assess the performance of the Vision Transformer (ViT) model in

Effects of learning rate
In the current study, the learning rate is varied from 0.00001 to 0.1, and the corresponding classification accuracy of the ViT model is presented in Table 3.The learning rate is a critical hyperparameter that determines how quickly the model's loss value converges to the minimum.A large learning rate can cause the network's loss gradients to increase rapidly, leading to poor model performance.Conversely, a low learning rate leads to slower convergence as the loss gradients gradually update.Hence, it is important to determine the optimum learning rate that suits for this particular application at hand.From Table 3, it is evident that the ViT model achieves the highest classification accuracy of 99.39% when the learning rate is set to 0.0001.Once the optimal learning rate is identified, it is used to fine-tune the other hyperparameters.The performance of the ViT model across different learning rates is summarized in Table 3, while Figure 6

Effect of transfer layer
The number of transformer layers in a Vision Transformer (ViT) has a significant impact on its performance.Generally, increasing the number of transformer layers improves the ability of the model to capture complex representations of the input image, thereby enhancing its performance.However, there is a point beyond which increasing the number of layers may lead to degradation in performance.In this study, the performance of the ViT model was evaluated by varying the number of transformer layers.The results showed that when the number of layers was set to 12, the ViT model achieved the maximum classification accuracy of 98.16%.This suggests that 8 layers provide an optimal balance between capturing complex image representations and avoiding potential performance degradation.

Conclusion
This study introduces an innovative approach to suspension are extensively utilized due to their uncomplicated structure, reliable performance, and cost-effectiveness.The McPherson type suspension system is preferred over the double wishbone type due to its simplicity, lightweight, and cost-effectiveness.It consists of components like struts, ball joints, tie rods, and lower control arms, which can wear out over time, especially when exposed to varying road conditions and loads.Factors like wear, lack of lubrication, misalignment, heavy loads, mishandling, improper installation, and corrosion can increase the chances of faults.Detecting these faults early is crucial to maintain suspension performance, minimize maintenance disruptions, and prevent potentially dangerous accidents.Therefore, fault diagnosis is essential for ensuring safety, reliability, and comfort in vehicle operation.Various techniques have been developed for fault diagnosis, including knowledge-based, data-driven (signalbased), analytical modelling (model-based), and hybrid techniques.Among these techniques, data-driven methods are widely employed owing to their capability to operate in real time.The data or signals acquired during the data-acquisition process display distinct signature patterns for particular fault conditions, enabling effective classification.Parameters such as the vibration, pressure, load, and displacement provide valuable information regarding the state of the suspension system.following

Firstly
, faults often manifest distinct patterns in vibration signals, making it an effective method for their detection.Moreover, vibration signal acquisition demonstrates remarkable sensitivity, enabling the identification of even minor deviations.Additionally, vibration signals exhibit a higher signal-to-noise ratio compared to acoustic emission, rendering them more valuable for fault diagnosis.Lastly, advancements in integrated circuit technology have enhanced the reliability and cost-effectiveness of accelerometer sensors, further bolstering the practicality of this approach.

••
Many studies involving damper and ball joint fault diagnosis require a vibration platform for data collection.• A shortage of studies applying machine learning and deep learning to suspension system fault diagnosis.The efficiency of fault-diagnosis techniques based on machine learning depends significantly on feature engineering, which involves the extraction and selection of features.Choosing the right feature extraction method Eksploatacja i Niezawodność -Maintenance and Reliability Vol. 26, No. 1, 2024 requires in-depth domain knowledge and expertise to achieve accurate fault classification, while minimizing computational requirements and time.Furthermore, these feature extraction techniques are sensitive to variations in the environmental systems and mechanical characteristics.Conventional manual feature extraction methods hinder the exploration of novel features owing to the influence of the existing features and evaluation criteria.Owing to this constraint, researchers have increasingly embraced deep-learning-based fault diagnosis as a viable solution.Although numerous studies have been conducted on machine-learning-based fault diagnosis, the manual intervention required for feature engineering diminishes the robustness of the algorithms and can yield unsatisfactory outcomes in certain applications.To address these challenges, deep learning can be employed to extract features and perform classifications directly from images derived from the vibration signals.This approach increases the accuracy of the fault diagnosis.Deep learning (DL) is a powerful tool for data processing; however, it requires substantial computational power.Fortunately, recent advancements in processing technology have made DL more accessible and applicable to various fields, including speech recognition, robotics, text classification, and object detection.Convolutional neural networks (CNNs) form the fundamental architecture of DL models, allowing the extraction and learning of intricate features from image datasets.CNNs are particularly useful in speech recognition, pattern recognition, and object detection(23).Despite the extensive applications of DL, only a limited number of studies have explored its potential for the fault diagnosis of suspension systems.This represents a significant opportunity for further research and discovery in this field.In this study, a vision transformer (ViT), a neural network that incorporates the attention mechanism proposed by Vaswani et al., was utilized to address this gap and explore the potential of DL in the fault diagnosis of suspension systems (24).The encoder-decode architecture is utilized to transform one sequence of elements into another sequence.The attention mechanism plays a crucial role in capturing long-distance features in the time-series data.The transformer model has shown remarkable performance in the field of Natural Language Processing (NLP), specifically in tasks such as machine translation and speech recognition.It outperformed cyclic neural networks and short-term memory networks that rely on iterative serial training (25).The Transformer model facilitates parallel training and captures global information by processing natural language processing (NLP) words, leading to significant improvements in training accuracy.Building on the success of the Transformer in NLP, this study proposes its application in fault diagnosis scenarios.To assess the effectiveness of the vision transformer in fault diagnosis, a case study was conducted using spectrogram images derived from the vibration signals acquired from suspension system under different fault conditions.In this study, the fault diagnosis of a suspension system was evaluated using spectrogram images as inputs to a vision transformer model.By utilizing spectrogram images, the model can effectively learn both the temporal and frequency components of signals, which are crucial for accurate fault classification.Moreover, the conversion from raw images to spectrogram images reduces the dimensional complexity and allows the representation of the frequency components necessary for capturing fault-specific vibration patterns.This conversion process enhances the robustness of the model by minimizing noise and reducing overall complexity.The current study introduces several novel aspects: o The utilization of spectrogram images as input enables the model to effectively capture both the temporal and frequency components of the signals, thereby facilitating accurate fault identification.o By adopting a vision transformer instead of a conventional convolutional neural network (CNN), the model becomes capable of simultaneously learning the temporal and frequency components within the images.This approach enhances the accuracy of fault classification.o The utilization of a pretrained vision transformer, initially trained on a larger dataset, allows for finetuning of the model on specific custom datasets.This process enhances the performance and adaptability of the model to the given fault diagnosis task.To evaluate the performance of the Vision Transformer (ViT) model in diagnosing suspension faults, an experimental study was conducted.The study encompassed one good condition and seven fault conditions, namely, lower arm (ball joint and bush worn out), strut mount failure, worn out strut, external damaged strut, low tire pressure, and tie rod ball joint worn.The experimental study followed the outlined process below: Vibration signals were collected from the sensor and subsequently converted into spectrogram images.

Fig. 2 .
Fig. 2. Work flow of fault diagnosis of suspension system using vision transformer.
(a) development of the experimental setup, (b) considered faults in the suspension system, and (c) data acquisition process.To simulate real-time McPherson suspension system operation in front-wheel drive vehicles, a quarter-car model was used as an experimental setup.Signals were collected with a vibration sensor (accelerometer) attached to the suspension system control arm using adhesive.Various faults were introduced by systematically replacing suspension components, resulting in unique vibration signals for each fault condition.Additionally, vibration signals from a healthy suspension system were obtained for comparison.The experimental setup was meticulously designed to accurately represent different suspension system faults, enabling thorough analysis and evaluation.

available
Hyundai i10 model to establish an experimental setup.The resulting suspension setup, as shown in Figure3, comprises components such as a strut, lower arm, tie rod, wheel, drive shaft, motor, idle roller, and loader.The primary objective of this setup was to evaluate the performance of the passive suspension system when the tire operated at a constant speed on a smooth surface.The setup was designed and fabricated to ensure accurate positioning of the suspension system, including the wheel (rim and tire), above the two idle rollers, enabling seamless rotation.To minimize the presence of undesirable vibrations, the torque generated by the motor is transmitted to the wheel through the utilization of a constantvelocity joint (CV) and belt drives.The height of the idle rollers was adjustable based on the load requirements, which were determined using a pressure gauge and controlled through a hydraulic jack and a guided pillar assembly.This flexibility enabled the setup to accommodate various load conditions during experimental testing.

•
process of converting real-world phenomena into digital values, which can be stored, visualized, and analysed on a computer, is known as data acquisition (DAQ).In this study, fault diagnosis of the suspension system is carried out by acquiring vibration signals using an accelerometer.The accelerometer used in the study is a piezoelectric sensor with a sensitivity of 10.26mV/g and is mounted on the lower arm of the suspension system using adhesive.To convert the analog vibration signals into a digital format, the Ni9234 DAQ is utilized.This DAQ module is connected to a USB chassis.The data acquisition process is facilitated by the NI LabVIEW software, which supports the DAQ system.During the signal collection process, the following parameters are considered: Sampling length: 10,000 samples • Sampling frequency: 25 kHz • Number of instances for each condition: 100 signals By adhering to these parameters, a sufficient amount of data is collected for each fault condition and used for further processing.
provides a visual representation of the different types of faults that can occur in suspension components, and the following sub section describes various fault considered in the study along with their causes and their symptoms.It is crucial to understand and diagnose these faults accurately to ensure effective maintenance and optimal functioning of the suspension system.The proper identification and timely rectification of these faults are essential for maintaining vehicle safety and enhancing the overall driving experience.

7 .
Low wheel pressure (LWP) may result from check valve failure or punctures.This can lead to stiff or hard steering, uneven tire wear, and related issues(28).
Vision Transformer (ViT) is a neural network architecture that overcomes certain limitations of Convolutional Neural Networks (CNNs) in image processing tasks.Unlike CNNs, which process input images using convolutional layers, the ViT model employs an attention mechanism to handle image patches.These patches are small, fixed-size crops of the input image, and they are treated as a sequence of vectors, similar to how transformer models process sequences of text.By using the attention mechanism, the Vision Transformer can effectively capture global patterns in the image, rather than being restricted to local regions.This allows the model to have a broader understanding of the image content and improves its ability to recognize complex visual patterns.Another advantage of the Vision Transformer is its ability to achieve good performance with smaller amounts of data.CNN typically demand significant quantities of labelled data for effective training due to their reliance on learning hierarchical features through convolutional and pooling layers.In contrast, ViT utilize self-attention mechanisms to capture global data relationships.This makes the Vision Transformer particularly beneficial in scenarios where data availability is limited or costly to obtain.Overall, the Vision Transformer introduces a new approach to image processing tasks, leveraging the power of attention mechanisms and enabling effective learning from smaller datasets.

Fig. 5
Fig. 5 Sample spectrogram images of considered faults conditions.
diagnosing suspension system faults.The evaluation is conducted through five distinct experiments, which involve modifying key hyperparameters: the learning rate, patch size, batch size, number of heads, and number of MLP layers.The identification of optimal values for these hyperparameters is crucial in achieving the highest possible classification accuracy while effectively utilizing computational resources and minimizing processing time.As a result, the proposed ViT model emerges as an efficient solution for fault diagnosis in the suspension system.The insights obtained from this study hold significant value in optimizing the performance of the ViT model in realworld scenarios where efficient resource utilization is paramount.By understanding the impact of different hyperparameter settings on classification accuracy, researchers and practitioners can enhance the accuracy and effectiveness of fault diagnosis in suspension systems.

Fig. 9 .
Fig. 9. Comparison curve for variation in number of head (a) training loss, (b) training accuracy.

Fig. 10 . 4 . 6 Fig. 11 .
Fig. 10.Comparison curve for variation in number of transformer layers a) training loss, (b) training accuracy.4.6 Optimum hyper parameter valuesBased on the computational results obtained in the previous sections, this study successfully identified the optimal hyperparameters that significantly enhanced the performance of the ViT model.The best hyperparameters along with their

Fig. 12 .Fig. 13 .
Fig. 12. Training loss plot of ViT model system fault diagnosis, departing from traditional methods that directly analyse vibration signals.Instead, our proposed method harnesses the power of the Vision Transformer (ViT) model, incorporating a self-attention mechanism, to classify various faults, including bushings, ball joints, struts (damper and spring), and tires within the suspension system.This classification is based on spectrum images derived from vibration signals.To optimize the ViT model's performance, key parameters, such as learning rate, patch size, batch size, heads, and transformer layers, were tuned.The best ViT model achieved impressive 98.12% accuracy with a learning rate of 0.0001, patch size 6, batch size 8, 4 heads, and 12 transformer layers.Additionally, it balances performance and computational efficiency, enhancing system reliability by monitoring suspension system component faults.While the proposed model has shown strong performance on the author's specific dataset, its suitability for other datasets remains unverified.Additionally, real-time implementation demands high-end computational resources, which could hinder practical deployment.Future research holds promise in several areas.Firstly, optimizing the sample length for spectrogram image generation could reduce computational demands.Secondly, enhancing model performance is possible through parameter tuning using techniques like grid search.Lastly, conducting comprehensive training with data under different conditions can improve its applicability and robustness in real-world scenarios.

.no Method Approach component Reference
. Experimental comparison of state of art suspension fault diagnosis study.S

Table 1
, one can ascertain he vibration-based datadriven approach has garnered significant attention in fault diagnosis of automobile system for compelling reasons.

Table 2
illustrates the research work carried out using vibration-based fault diagnosis in the field of automotive technology in recent times.

Table 2 .
Research works on automotive technology using vibration-based methods.

Table 3 .
Performance comparison of ViT model with different

Table 4 .
Performance comparison of ViT model with different

Table 4 ,
it can be observed that a patch size of 6x6 yields

Table 6 .
Performance comparison of ViT model with respect

Table 7 .
Performance comparison with respect to number of