Machine Fault Diagnosis Based on Causality Inspired Learning
基於因果啟發學習的機械故障診斷方法
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 18 Oct 2023 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(e5761555-a02b-4b0b-9d5b-06c997e656fc).html |
---|---|
Other link(s) | Links |
Abstract
Machines are ubiquitously employed in modern industry and people’s daily life. Malfunctions of machines such as airplanes can lead to unaffordable losses. Hence, it is of paramount importance to apply machine fault diagnosis methods to timely identify incipient faults to proactively prevent severely abnormal machine operations. Traditional fault diagnosis methods such as signal processing techniques often depend on experts in designing feature extractors and recognizing fault patterns. These methods are usually labor-costly, expert-dependent, and incapable of handling big data. Data-driven fault diagnosis methods utilizing machine learning/deep learning techniques can learn models from data and classify fault patterns automatically, which can overcome the aforementioned downsides of traditional methods. However, data-driven approaches also have limitations in fault diagnosis, such as the challenge of learning from small data, the difficulty of generalizing outside training distributions, and the vulnerability of label noises. In this thesis, we show that the weaknesses of data driven models can be alleviated by the causality inspired learning, a learning paradigm equipped with causality inspirations, methodologies, and theories. To be more specific, we study and contribute the followings.
First, we are inspired by the fact that faults of rotating machineries cause different signal modulation patterns. Based on this causality inspiration, we develop a novel method integrating learnable variational kernels into a 1-demensional convolutional neural network (1-D CNN) to extract important fault-related features and offer decent performance with even limited data. In this method, the variational kernel is first derived by adapting constraints and formulations of the successive variational mode decomposition. Next, a gradient descent process based on a fault classification loss is developed to estimate the parameter of the variational kernel. Lastly, a CNN-based diagnosis model is constructed to perform machinery fault diagnoses with limited training data. Note that the variational kernel based 1-D CNN is trained by the empirical risk minimization (ERM). Although ERM is a celebrated induction principle for developing data-driven models, there is controversy over its capability on domain generalization (DG).
Second, to deepen the understanding of ERM, we offer an in-depth study from causal perspective on DG success and failure of ERM under general machine learning settings. In the theoretical aspect, we first explore different properties of a causal metric termed information flow. Then we discuss the relationships between the information flow and the mutual information in the proposed causal graph. Next, we analyze the roles of transformed causal feature and transformed spurious feature on modeling performances. It reveals that the interaction between the spurious influencer and the causal feature is the key to determine the failure and success of ERM on DG. The insight also strengthens the belief that learning invariance can be a promising direction towards learning causality and solving the DG problem.
Third, we propose a sparsity constrained invariant risk minimization (SCIRM) framework, which develops machine learning models with better generalization capacities for environmental disturbances in machinery fault diagnosis. The SCIRM is built by innovating the optimization formulation of the recently proposed invariant risk minimization (IRM) and its variants through the integration of sparsity constraints. We prove that if a sparsity measure is differentiable, scale-invariant and semistrictly quasi-convex, the SCIRM can be guaranteed to solve the domain generalization problem based on a few predefined problem settings. We mathematically derive a family of such sparsity measures. A practical process of implementing the SCIRM for machinery fault diagnosis tasks is offered.
Last but not least, to combat the problem of DG with label noise in fault diagnosis, we propose to improve a causality inspired DG method, i.e., the IRM, to become more label noise tolerant. Specifically, we propose an extended IRM (EIRM) by shifting the gradient penalty base from the dummy classifier to the whole model to learn better features. We show that EIRM is closely related to finding a flat minimum, which is crucial for label noise robustness and model generalization as supported by recent studies. To boost the performance, we propose another method by employing the mix-up mechanism to augment data, named mix-up EIRM (MEIRM). We provide efficient implementations for EIRM and MEIRM to circumvent the difficulty of Hessian computation and build up machine fault diagnosis methods. Theories on function smoothness and algorithm convergence are also developed to enhance the understanding of the proposed methods.
First, we are inspired by the fact that faults of rotating machineries cause different signal modulation patterns. Based on this causality inspiration, we develop a novel method integrating learnable variational kernels into a 1-demensional convolutional neural network (1-D CNN) to extract important fault-related features and offer decent performance with even limited data. In this method, the variational kernel is first derived by adapting constraints and formulations of the successive variational mode decomposition. Next, a gradient descent process based on a fault classification loss is developed to estimate the parameter of the variational kernel. Lastly, a CNN-based diagnosis model is constructed to perform machinery fault diagnoses with limited training data. Note that the variational kernel based 1-D CNN is trained by the empirical risk minimization (ERM). Although ERM is a celebrated induction principle for developing data-driven models, there is controversy over its capability on domain generalization (DG).
Second, to deepen the understanding of ERM, we offer an in-depth study from causal perspective on DG success and failure of ERM under general machine learning settings. In the theoretical aspect, we first explore different properties of a causal metric termed information flow. Then we discuss the relationships between the information flow and the mutual information in the proposed causal graph. Next, we analyze the roles of transformed causal feature and transformed spurious feature on modeling performances. It reveals that the interaction between the spurious influencer and the causal feature is the key to determine the failure and success of ERM on DG. The insight also strengthens the belief that learning invariance can be a promising direction towards learning causality and solving the DG problem.
Third, we propose a sparsity constrained invariant risk minimization (SCIRM) framework, which develops machine learning models with better generalization capacities for environmental disturbances in machinery fault diagnosis. The SCIRM is built by innovating the optimization formulation of the recently proposed invariant risk minimization (IRM) and its variants through the integration of sparsity constraints. We prove that if a sparsity measure is differentiable, scale-invariant and semistrictly quasi-convex, the SCIRM can be guaranteed to solve the domain generalization problem based on a few predefined problem settings. We mathematically derive a family of such sparsity measures. A practical process of implementing the SCIRM for machinery fault diagnosis tasks is offered.
Last but not least, to combat the problem of DG with label noise in fault diagnosis, we propose to improve a causality inspired DG method, i.e., the IRM, to become more label noise tolerant. Specifically, we propose an extended IRM (EIRM) by shifting the gradient penalty base from the dummy classifier to the whole model to learn better features. We show that EIRM is closely related to finding a flat minimum, which is crucial for label noise robustness and model generalization as supported by recent studies. To boost the performance, we propose another method by employing the mix-up mechanism to augment data, named mix-up EIRM (MEIRM). We provide efficient implementations for EIRM and MEIRM to circumvent the difficulty of Hessian computation and build up machine fault diagnosis methods. Theories on function smoothness and algorithm convergence are also developed to enhance the understanding of the proposed methods.
- Causal Learning, Convolutional Neural Network, Domain Generalization, Label Noise, Machine Fault Diagnosis, Risk Minimization