Sparse representation based online appearance models for robust visual tracking

  • Tianxiang BAI

    Student thesis: Doctoral Thesis

    Abstract

    Visual tracking is one of the essential tasks in intelligent video analysis because of its wide application in video surveillance, human motion understanding, human-computer interaction, and so on. Thus, a great deal of literature on the development of tracking algorithms report promising results under various scenarios. However, tracking the non-stationary appearance of objects undergoing significant pose variations, illumination variations, and occlusions remains a challenge. This thesis addresses these challenges by proposing robust online appearance models based on a sparse representation framework for visual tracking. First, a structured sparse representation appearance model is presented for tracking an object in a video system. This method models the appearance of an object as a sparse linear combination of a structured union of subspaces in an overcomplete dictionary, which consists of a learned Eigen template dictionary and a partitioned occlusion dictionary. The structured sparse representation framework addresses the practical visual tracking problem by considering the contiguous spatial distribution of occlusion. The Block Orthogonal Matching Pursuit (BOMP) algorithm is used to solve the structured sparse representation problem, achieve a sparse solution, and to reduce the computational cost. Furthermore, to update the Eigen templates over time, the incremental Principal Component Analysis (PCA)-based learning scheme is used to adapt the varying appearance of the target online. This solution directly improves the robustness of tracking objects undergoing non-stationary appearance variations. In the previous appearance model, visual drifts are still observed in instances of extreme pose and illumination variations because the description of the fixed subspace model is limited to the approximation of the complex and nonlinear manifold of appearance. A more flexible and discriminative appearance model based on the structured sparse representation framework is developed. Instead of modeling the appearance manifold using a single subspace, the proposed model uses a number of low-dimensional linear subspaces to adapt the non-stationary appearance. To enhance the discriminative power of the model, a clustered background dictionary is constructed and then updated during tracking. Using the BOMP algorithm, more robust and stable tracking results are observed compared with the prototype tracker based on the structured sparse representation appearance model. However, the computational costs of previous structured sparse representation appearance models remain very high because the dictionaries they use are overcomplete. Therefore, a novel sparse representation appearance model based on learned local appearances is investigated. In this approach, visual appearance is represented by sparse representation, and the online dictionary learning strategy is used to adapt the appearance variations during tracking. The sparse representation and online dictionary learning are unified by defining a sparsity consistency constraint that facilitates the generative and discriminative capabilities of the appearance model. An elastic-net constraint is enforced during the dictionary learning stage to capture the characteristics of the local appearances that are insensitive to partial occlusions. Hence, the target appearance is effectively recovered from corruption using the sparse coefficients, with respect to the learned sparse bases that contain local appearances. In this method, the dictionary is undercomplete and can thus be efficiently implemented for tracking. The proposed local appearance model facilitates more robust tracking performance compared with other state-of-the-art approaches.
    Date of Award3 Oct 2012
    Original languageEnglish
    Awarding Institution
    • City University of Hong Kong
    SupervisorYou Fu LI (Supervisor)

    Keywords

    • Automatic tracking
    • Computer vision
    • Pattern recognition systems

    Cite this

    '