Ensemble learning plays a significant role in supervised learning field for improving
the pattern recognition capability of multiple classifier systems. The inner spirit of
ensemble methodology is to train several base classifiers and combine them by assigning
different weights for a better performance, which usually outperforms each individual
classifier. However, the current ensemble methods still suffer from different sorts of
problems, such as the unstable performance caused by the presence of noisy data, the
ineffective fusion strategies for combining individual classifier outputs, and the lack of
multi-objective evolutionary algorithm based methods to break multi-class problem into
dichotomies. Further, with regard to the fuzzy genetic system with multi-objectives, it is
easy to apply the ensemble pruning and diversity techniques to select less fuzzy rule classifiers
and increase the diversity between each pair of individuals in evolutionary process.
Inspired by the above problems, this thesis investigates several methodologies for improving
robustness of ensemble learning classifiers which involves a noisy-detection based
approach for Adaptive Boosting (AdaBoost), a class-specific weighted fusion method for
Extreme Learning Machine (ELM), an indicator selection based multi-objective evolutionary
algorithm with preference for multi-class classification systems, and an algorithm
by applying ensemble pruning and diversity techniques for multi-objective hierarchical
evolutionary algorithm.
Specifically, the main contributions of this thesis are outlined as follows:
1. A new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost),
is exploited to combine classifiers by emphasizing on training misclassified noisy instances
and correctly classified non-noisy instances. This algorithm is proposed based
on the fact that AdaBoost is prone to overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or
outliers). Concretely, the algorithm is designed by integrating a noise-detection based loss
function into AdaBoost to adjust the weight distribution in each iteration. A k-nearestneighbor
(k-NN) and an expectation maximization (EM) based evaluation criteria are both
constructed to detect noisy instances. Further, a regeneration condition is presented and
analyzed to control the ensemble training error bound of the proposed algorithm which
provides theoretical support to the algorithm.
2. A class-specific weight based soft voting method is presented for the design of ELM
ensembles (CSSV-ELM). The class-specific soft voting method is a common approach to
deal with the base learner which produces outputs with class probabilities. Further, the
new algorithm is designed based on the incorporation of two important characteristics
for improving the reliability of ELM. Firstly, the individual ELM classifiers have unequal
performances since the initialization of the hidden node learning parameters are randomly
generated. Secondly, as a linear equation system based algorithm, ELM may suffer the illconditioned
problem. According to these two factors, the classic weighted voting scheme
and the condition number of matrix are integrated into CSSV-ELM algorithm. Additionally,
compared with other neural networks, there is no empirical research in weighted
voting based ELM ensembles. In this work, we also compare and analyze seven weighted
voting methods with the proposed method.
3. One of the most difficult components for multi-class classification system is to find
an appropriate Error-Correcting Output Codes (ECOC) matrix, which is used to decompose
the multi-class problem into several binary class problems. In this thesis, an indicator
based multi-objective evolutionary algorithm with preference involved is designed to
search the high-quality ECOC matrix. Specifically, the Harrington's one-sided desirability
function is integrated into an indicator-based evolutionary algorithm (IBEA), which
aims to approximate the relevant regions of pareto front (PF) according to the preference
of the decision maker.
4. The contributions of the proposed evolutionary algorithm are two-fold: firstly,
it employs a multi-objective evolutionary hierarchical algorithm (MOHEA) to obtain a
non-dominated fuzzy rule classifier set with interpretability and diversity preservation.
Secondly, a reduce-error based ensemble pruning method is utilized to decrease the size
and enhance the accuracy of the combined fuzzy rule classifiers. In this algorithm, each
chromosome represents a fuzzy rule classifier and consists of three different types of
genes: control, parameter and rule genes. In each evolution iteration, each pair of classifiers
in non-dominated solution set with the same multi-objective qualities is examined in
terms of Q statistic diversity values. Then, similar classifiers are removed to preserve the
diversity of the fuzzy system.
Date of Award | 15 Feb 2013 |
---|
Original language | English |
---|
Awarding Institution | - City University of Hong Kong
|
---|
Supervisor | Tak Wu Sam KWONG (Supervisor) |
---|
Design methods for improving robustness of ensemble learning classifiers
CAO, J. (Author). 15 Feb 2013
Student thesis: Doctoral Thesis