Classification on Biometrics Traits with Applications to Face Recognition and Zebrafish Screening
計量生物學特徵上的分類研究及其在人臉識別與斑馬魚篩查中的應用
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 8 Aug 2016 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(3848c700-60f2-49cf-8bb6-59620fcdbfe4).html |
---|---|
Other link(s) | Links |
Abstract
Biometrics aims at developing mathematical methods for identifying individuals by their biological traits. Biological traits can be categorized into anatomical and behavioral traits. In this thesis, anatomical and behavioral traits are considered by exploring two important yet challenging biometric problems: the classification of large-scale datasets with limited labels and high-throughput screening.
Because the size of real-life datasets increases exponentially with the amount of information available, the labeled samples for each subject are yet still limited, in some cases with only a single labeled sample per subject. Therefore, I have proposed a novel algorithm that is able to leverage the unlabeled data to achieve state-of-the-art performance. In particular, I have examined the challenges of face recognition, an example of classification based on anatomical traits. My proposed face recognition method belongs to the category of sparse representation based classification (SRC) methods. SRC introduced the l1-norm minimization constraint on the coefficient vector to select the least number of training images (i.e. the dictionary) necessary to represent the testing image with minimal residuals. Due to its strong robustness and representative power without hand-crafted features, SRC has attracted much attention in face recognition research. State-of-the-art SRC methods usually focus on learning a more precise dictionary rather than using the training samples directly, to manage the severe degeneration between the training and the testing data, e.g. changes of expressions and/or occlusions. However, how to learn a precise dictionary using a limited amount of labeled data and leveraging unlabeled data remains an open question.
In this thesis, I propose a semi-supervised method of learning a gallery dictionary that leverages unlabeled data for better performance. The method follows the state-of-the-art Extended SRC (ESRC) framework, which separates the dictionary into a gallery sub-dictionary and a variation sub-dictionary. I first rectify the data by eliminating large variations using the ESRC model. A Gaussian Mixture Model (GMM) is then constructed and the semi-supervised Expectation-Maximization (EM) algorithm is used to estimate the gallery dictionary. Finally, the learned gallery dictionary is integrated into the ESRC framework for classification. The proposed algorithm is evaluated by its face recognition performance. The results demonstrate that the proposed method offers promising classification performance.
Behavioral traits are also widely used in biometric applications. However, behavioral data are usually stochastic and have outliers. An example of behavioral trait biometrics is light-induced locomotor response (LLR) based zebrafish screening, which aims to classify individual zebrafish based on differences in their behavior. As a behavioral trait biometric application, LLR based zebrafish screening is a critical technology for screening drugs for human eye diseases, because the zebrafish retina is anatomically similar to the human retina. However, traditional screening relies on human effort, there is a lack of feasible methods for automatic and high-throughput individual zebrafish screening. In fact, previous studies have used the mean LLR activity for a particular class, but these method neglects the within-class differences among the individual zebrafish. In order to develop a high-throughput zebrafish screening framework using machine learning methods, another prerequisite is that the feature set should be biologically meaningful for interpretation of the results. To relax these limitations, in this thesis I propose a novel machine learning framework with biologcally meaningful features for automatic and high-throughput zebrafish screening. I incorporate several classification methods that are easy to implement by our biologist users. The experimental results demonstrate that the framework is able to provide satisfactory performance. Specifically, using this novel machine learning framework, two zebrafish screening problems are explored, i.e. zebrafish screening on different mutants and different wild-type strains. The proposed method delivers desirable results in the experiments, e.g. the classification accuracies are over 80% in general, and up to 95% for screening on different mutants. Moreover, novel biological discoveries that are observed as a consequence of the proposed method are reported for the both problems. For example, the LLRs are not only different between different mutants, they also vary between different wild-type mutants. The extent of these differences varies with age.
To summarize, I have studied two important yet challenging biometric problems using anatomical and behavioral traits: face recognition and LLR based zebrafish screening. A novel algorithm is proposed for face recognition, which delivers a significant improvement over the state-of-the-art results. In addition, the statistical machine learning framework is successfully applied to LLR based zebrafish screening, which provides satisfactory results and facilities novel biological discoveries.
Because the size of real-life datasets increases exponentially with the amount of information available, the labeled samples for each subject are yet still limited, in some cases with only a single labeled sample per subject. Therefore, I have proposed a novel algorithm that is able to leverage the unlabeled data to achieve state-of-the-art performance. In particular, I have examined the challenges of face recognition, an example of classification based on anatomical traits. My proposed face recognition method belongs to the category of sparse representation based classification (SRC) methods. SRC introduced the l1-norm minimization constraint on the coefficient vector to select the least number of training images (i.e. the dictionary) necessary to represent the testing image with minimal residuals. Due to its strong robustness and representative power without hand-crafted features, SRC has attracted much attention in face recognition research. State-of-the-art SRC methods usually focus on learning a more precise dictionary rather than using the training samples directly, to manage the severe degeneration between the training and the testing data, e.g. changes of expressions and/or occlusions. However, how to learn a precise dictionary using a limited amount of labeled data and leveraging unlabeled data remains an open question.
In this thesis, I propose a semi-supervised method of learning a gallery dictionary that leverages unlabeled data for better performance. The method follows the state-of-the-art Extended SRC (ESRC) framework, which separates the dictionary into a gallery sub-dictionary and a variation sub-dictionary. I first rectify the data by eliminating large variations using the ESRC model. A Gaussian Mixture Model (GMM) is then constructed and the semi-supervised Expectation-Maximization (EM) algorithm is used to estimate the gallery dictionary. Finally, the learned gallery dictionary is integrated into the ESRC framework for classification. The proposed algorithm is evaluated by its face recognition performance. The results demonstrate that the proposed method offers promising classification performance.
Behavioral traits are also widely used in biometric applications. However, behavioral data are usually stochastic and have outliers. An example of behavioral trait biometrics is light-induced locomotor response (LLR) based zebrafish screening, which aims to classify individual zebrafish based on differences in their behavior. As a behavioral trait biometric application, LLR based zebrafish screening is a critical technology for screening drugs for human eye diseases, because the zebrafish retina is anatomically similar to the human retina. However, traditional screening relies on human effort, there is a lack of feasible methods for automatic and high-throughput individual zebrafish screening. In fact, previous studies have used the mean LLR activity for a particular class, but these method neglects the within-class differences among the individual zebrafish. In order to develop a high-throughput zebrafish screening framework using machine learning methods, another prerequisite is that the feature set should be biologically meaningful for interpretation of the results. To relax these limitations, in this thesis I propose a novel machine learning framework with biologcally meaningful features for automatic and high-throughput zebrafish screening. I incorporate several classification methods that are easy to implement by our biologist users. The experimental results demonstrate that the framework is able to provide satisfactory performance. Specifically, using this novel machine learning framework, two zebrafish screening problems are explored, i.e. zebrafish screening on different mutants and different wild-type strains. The proposed method delivers desirable results in the experiments, e.g. the classification accuracies are over 80% in general, and up to 95% for screening on different mutants. Moreover, novel biological discoveries that are observed as a consequence of the proposed method are reported for the both problems. For example, the LLRs are not only different between different mutants, they also vary between different wild-type mutants. The extent of these differences varies with age.
To summarize, I have studied two important yet challenging biometric problems using anatomical and behavioral traits: face recognition and LLR based zebrafish screening. A novel algorithm is proposed for face recognition, which delivers a significant improvement over the state-of-the-art results. In addition, the statistical machine learning framework is successfully applied to LLR based zebrafish screening, which provides satisfactory results and facilities novel biological discoveries.