Abstract
In the context of one-versus-one close-range air combat involving Unmanned Combat Aerial Vehicles (UCAVs), existing Reinforcement Learning (RL) methods exhibit a significant trade-off between generalization performance and combat efficiency. Training strategies tailored to specific opponents enhance kill rates but suffer from limited generalization performance. Conversely, models approximating Nash Equilibrium through diversified opponent strategies achieve robust generalization but incur reduced efficiency due to training complexity and conservative decision-making. To address this challenge, this paper proposes a robust maneuver decision-making approach based on Opponent Modeling and Reinforcement Learning (OMRL). Grounded in the perspective of the Information Horizon, this approach categorizes opponent strategies into Long-Sighted Strategies, Short-Sighted Strategies, and Fixed Strategies. By employing a Long Short-Term Memory (LSTM) network, OMRL accurately classifies opponent trajectories and leverages the Proximal Policy Optimization algorithm to train targeted solution strategies, thereby constructing an efficient adversarial framework. OMRL dynamically identifies opponent strategy types in real time and invokes corresponding solution strategies, effectively balancing generalization performance and combat efficiency. Experimental results demonstrate that OMRL achieves an average win rate of 0.64 in testing, surpassing other state-of-the-art RL methods. This study represents the first to introduce the concept of Information Horizon-based classification, systematically analyzing the characteristics of various strategies, training an LSTM classifier with trajectory data, and developing the OMRL framework. Through adversarial experiments and ablation studies, the superiority and scalability of OMRL are validated, providing an innovative theoretical and practical foundation for efficient collaborative combat involving UCAVs. © Systems Engineering Society of China and Springer-Verlag GmbH Germany 2025.
| Original language | English |
|---|---|
| Number of pages | 34 |
| Journal | Journal of Systems Science and Systems Engineering |
| Online published | 31 Oct 2025 |
| DOIs | |
| Publication status | Online published - 31 Oct 2025 |
Funding
This work was partially supported by GRF: CityU 11306821 of Hong Kong SAR Government. The authors would like to thank the anonymous reviewers and editor for their valuable comments and suggestions throughout the review process.
Research Keywords
- Air combat
- opponent modeling
- reinforcement learning
- self-play
- nash equilibrium
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Robust Close-Range Air Combat Maneuver Decision-Making Method Based on Opponent Modeling and Reinforcement Learning'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Differentiable Path-Following Methods with Compact Formulations to Compute Extended and Perfect d-Extended Proper Equilibria in Robust Games
DANG, C. (Principal Investigator / Project Coordinator)
1/01/22 → …
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver