Abstract
Wireless-based human motion sensing has experienced rapid advancements in recent years, driven by the proliferation of wireless communication technologies and advanced deep learning models. This technology enables real-time tracking and analysis of human movements, facilitating innovative solutions in many areas such as healthcare, sports performance monitoring, virtual reality, and smart environments. However, a series of issues make it challenging to implement these applications in realworld scenarios. This dissertation focuses on addressing these challenging issues to enhance the usability of these wireless-based human motion sensing applications.First, we find that most Millimeter-wave (mmWave) human motion sensing systems face a significant challenge due to the lack of training datasets. To addresses this issue, we propose SynMotion, which utilizes existing vision-based datasets to synthesize mmWave signals that mimic human motion. By leveraging these datasets, SynMotion generates high-quality labeled synthesized data that includes skeletal coordinates and activity names. This enables two key applications with commercial radars: zeroshot activity recognition and few-shot body skeleton tracking.
Second, wireless-based human activity recognition (WHAR) struggles with performance due to its sensitivity to changing sensing conditions. Recent research use fine-tuning to solve this problem. However, this process complicates implementation, requiring expertise in deep learning and making widespread adoption difficult. To tackle this issue, we propose leveraging the principle that data from the same activity class is more similar under the same sensing conditions, a property that remains consistent across various conditions. Our key insight is that meta-learning can reframe WHAR as a problem that operates under similar conditions, effectively decoupling it from the specific sensing environment. This approach allows for the development of a general and accurate WHAR system without the need for fine-tuning. We implement this concept through two innovative designs in a system called RoMF.
Finally, the rapid advancements in large language models (LLMs) present a promising solution for the challenge of multimodal fusion sensing with wireless signals. However, the target deployment scenarios for wireless sensing may be resource-constrained, making the approach of training separate encoders for each modality within large model architectures challenging for practical deployment. Therefore, we aim to achieve this fusion under the premise of a unified structure, where all modalities share a single encoder. In the concluding section of this dissertation, we explore the feasibility of utilizing language modalities as a medium for multimodal fusion within a unified model structure.
In summary, this dissertation introduces a series of innovative methodologies, ranging from SynMotions’s data synthesis to RoMF’s approach to avoiding fine-tuning, and finally study the feasibility of a unified framework for multimodal sensing with LLMs. Each of these contributions aims to enhance wireless-based human motion sensing.
| Date of Award | 28 Feb 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Jin Zhang (External Supervisor) & Zhenjiang LI (Supervisor) |