Towards Robust Animal Activity Recognition Using Deep Learning and Wearable Sensors


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date6 Sept 2023


An automated animal activity recognition (AAR) system allows caretakers to continuously and remotely monitor animal behavioral variations, thereby providing rich insights into animal health and welfare and promoting livestock management efficiency. Over the past decades, advancements in deep learning techniques and wearable sensors have driven the rapid development of automated and precise AAR systems. However, when we develop an AAR system based on deep learning and wearable sensors, some technical challenges must be improved before its practical implementation in commercial animal farming. This thesis mainly focuses on four practical challenges of building an AAR system, including multi-modal fusion, class imbalance, data privacy, and energy efficiency. Concretely, (1) Multi-modal fusion. Typically, multiple sensors of different types are attached to an animal’s body, or sensors of the same type are attached to different locations on an animal’s body, to record multi-modal data and obtain rich information. However, integrating multi-modal data poses a challenge for multi-modal fusion in the development of deep learning-based recognition models, as a model may struggle to generalize the different modalities of sensor data. A conflicting correlation between multiple modalities can easily interfere with multi-modal fusion, resulting in limited recognition performance. (2) Class imbalance. The frequency and duration of different animal behaviors tend to be inconsistent, owing to animals’ specific physiologies, thereby leading to a disproportion in the number of samples among behavioral classes and inducing class imbalance. Deep learning methods trained on imbalanced datasets tend to be biased towards majority classes and away from minority classes, which often causes poor model generalizability and high classification error rates for rare categories. (3) Data privacy. Deep learning has dominated the tasks in AAR due to the high performance achievable with the help of large-scale training datasets. However, in reality, constructing a large corpus of centralized datasets across different sources (e.g., farms) results in data ownership and privacy problems, and poses a significant risk of commercial information leakage for producers and stockholders. Compared to such a traditional centralized manner, distributed learning paradigm without exchanging private data can provide a promising solution in the future privacy-preserving AAR system. (4) Energy efficiency. Animal activities are generally monitored over a long period (e.g., a few weeks or several months), which requires sensing devices to continuously collect and transmit data. As most embedded sensing devices are battery-powered, factors affecting the energy consumption and battery life of sensing devices must be carefully considered. The literature has proven that higher sampling rates come at a cost in real-world deployments that rely on long-term operations. Considering practical benefits, existing works have often lowered the sampling rate of sensors to reduce energy costs. However, when the sampling rate falls below a threshold, the AAR performance degrades rapidly due to many relevant signals being missed.

In this thesis, I am devoted to investigating corresponding solutions to above-mentioned challenges, aiming to enhance the robustness of an automated AAR system. First, to improve the capability of AAR based on imbalanced multi-modal data, I develop a cross-modality interaction network (CMI-Net) for multi-modal fusion and adopt class-balanced (CB) focal loss for alleviating the class imbalance problem. Specifically, the CMI-Net consists of a dual CNN trunk architecture to extract modality-specific features and a cross-modality interaction module (CMIM) to achieve deep inter-modality interaction. In particular, the CMIM based on an attention mechanism adaptively recalibrates each modality’s temporal- and axis-wise features by leveraging multi-modal information. Thus, it enables the CMI-Net to effectively capture complementary information and suppress unrelated information from multiple modalities. In addition, the CB focal loss is employed to supervise the network training, and it can force the network to pay more attention not only to samples of minority classes, diminishing their influence from being overwhelmed during optimization, but also to samples that are hard to distinguish.

Second, I introduce a new distributed learning strategy, i.e., federated learning (FL), to achieve automated AAR based on decentralized data over different farms while protecting data privacy and ownership. I adequately consider two challenges (i.e., client-drift during local training and local gradient conflicts during global aggregation) resulting from data heterogeneity between multiple farms when directly applying FL to AAR tasks. To tackle these two challenges, I propose a novel FL framework called FedAAR that comprises a prototype-guided local update (PLU) module for local optimization and gradient-refinement-based aggregation (GRA) module for global aggregation. Specifically, the PLU module encourages all clients to learn consistent feature knowledge by imposing a global prototype guidance constraint to local optimization, reducing the divergence between client updates. The GRA module eliminates conflicting components between local gradients during global aggregation, effectively guaranteeing that all refined local gradients point in a positive direction to improve the agreement among clients.

Third, I present a novel approach, dubbed teacher-to-student information recovery (T2S-IR), to achieve energy-efficient AAR at low sampling rates while maintaining desirable performance. The T2S-IR effectively leverages the knowledge obtained from high-sampling-rate data, to assist in recovering the missing information in features extracted by the classification network trained on low-sampling-rate data. Specifically, I first utilize high-sampling-rate data for training teacher classification and reconstruction networks sequentially. Then, I train a student classification network using low-sampling-rate data, while promoting its performance by exploiting the knowledge learned by trained teacher networks via two novel modules, namely the reconstruction-based information recovery (RIR) module and the correlation-distillation-based information recovery (CDIR) module. Particularly, the RIR module exploits the pre-trained teacher reconstruction network to compel the student classification network to learn complete and descriptive features. The CDIR module enforces the feature maps of student network to mimic internal correlations within feature maps of pre-trained teacher classification network along temporal and sensor axes directions. The enhanced student network can be directly applied to infer different animal activities in practical scenarios with low sampling rates.

In conclusion, this thesis outlines four practical challenges associated with the development of AAR systems based on deep learning and wearable sensors, including multi-modal fusion, class imbalance, data privacy, and energy efficiency. Correspondingly, I have presented a series of strategies to address these challenges, including the CMI-Net combined with CB focal loss to achieve multi-model fusion and mitigate the class imbalance problem, the FedAAR to perform automated AAR by uniting decentralized data while preserving data privacy across farms, and the T2S-IR to maintain the favorable performance of AAR at low sampling rates. Extensive experiments conducted on public datasets acquired for horses or/and goats using tri-axial accelerometers and tri-axial gyroscopes have verified the effectiveness of the proposed methods, which exhibit superior performance to the state-of-the-art algorithms on various tasks.