Clustering Methods Based on Hidden Markov Model


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
  • Antoni B. Chan (Supervisor)
  • Dan Yu (External person) (External Supervisor)
Award date22 Aug 2023


The hidden Markov model (HMM) is a broadly applied generative model for representing time series data, assuming that each observation in a sequence is generated conditioned on a discrete state of a hidden Markov chain. HMM has been popularly applied in many areas that need to analyze time series data, such as speech recognition and cognitive science. In this thesis, we propose a novel HMM-based clustering method, the variational Bayesian hierarchical EM algorithm (VBHEM); an extension of VBHEM, the VBHEM with co-clustering method; a novel tree structure variational Bayesian method to learn the individual HMMs and group HMMs (i.e., cluster centers) simultaneously, the VB-CoLearn algorithm; and an extension of VB-CoLearn, the VB-CoLearn with co-clustering method. The main research results are as follows:

(1) Clustering HMMs attracts increased interests from machine learning researchers. However, the number of clusters (K) and the number of hidden states (S) for cluster centers are still difficult to determine. In this thesis, we propose a novel HMM-based clustering algorithm, the variational Bayesian hierarchical EM algorithm, which clusters HMMs through their densities and priors, and simultaneously learns posteriors for the novel HMM cluster centers that compactly represent the structure of each cluster. The numbers K and S can be automatically determined. In experiments, we demonstrate that our algorithm performs better than using model selection techniques with maximum likelihood estimation.

(2) To discover participant groups with consistent eye movement patterns across stimuli for tasks involving stimuli with different feature layouts, we extent VBHEM to VBHEM with co-clustering. Through applying this method to eye movements in scene perception, we discovered explorative (switching between the foreground and back-ground information or different regions of interest) and focused (mainly looking at the foreground with less switching) eye movement patterns among Asian participants. Higher similarity to the explorative pattern predicted better foreground object recognition performance, whereas higher similarity to the focused pattern was associated with better feature integration in the flanker task. These results have important implications for using eye tracking as a window into individual differences in cognitive abilities and styles.

(3) Hierarchical learning of generative models is useful for representing and interpreting complex data. For instance, one application is to learn an HMM to represent an individual's eye fixations on a stimulus, and then cluster individuals' HMMs to discover common eye gaze strategies. However, learning the individual representation models from observations and clustering individual models to group models are often considered as two separate tasks. We propose a novel tree structure variational Bayesian method to learn the individual model and group model simultaneously, called VB-CoLearn algorithm, by treating the group models as the parents of individual models, so that the individual model is learned from observations and regularized by its parents, and conversely, the parent model will be optimized to best represent its children. Due to the regularization process, our method has advantages when the number of training samples decreases. Experimental results demonstrate the effectiveness of the proposed method.

(4) When considering multiple stimuli with different feature layouts and the size of samples from each stimulus is small, the individual's HMM may over-fitted, then the performance in both HMM learning and clustering will be negatively affected. To overcome these drawbacks, we extent VB-CoLearn algorithm to cluster the individuals across all stimuli and share the assignments between stimuli through combining the co-clustering technique. Thus, the group HMM for one stimulus integrates the information brought by other stimuli. Such group models provide a co-clustering regularization for individual model when it learned from the data. Experiments on synthetic and real data, we obtained results for clustering and model selection that are better or comparable to other methods.