TY - JOUR
T1 - Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games
AU - Liao, Lan
AU - Ho, Daniel W. C.
AU - Yuan, Deming
AU - Yu, Zhan
AU - Zhang, Baoyong
AU - Xu, Shengyuan
PY - 2025/9/16
Y1 - 2025/9/16
N2 - This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms. © 2025 IEEE.
AB - This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms. © 2025 IEEE.
KW - Games
KW - Heuristic algorithms
KW - Nash equilibrium
KW - Cost function
KW - Vectors
KW - Quantization (signal)
KW - Mirrors
KW - Cybernetics
KW - Topology
KW - Symbols
KW - Bandit feedback
KW - distributed online optimization
KW - random quantization
KW - zero-sum game
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001575978200001
UR - http://www.scopus.com/inward/record.url?scp=105016754783&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105016754783&origin=recordpage
U2 - 10.1109/TCYB.2025.3604774
DO - 10.1109/TCYB.2025.3604774
M3 - RGC 21 - Publication in refereed journal
SN - 2168-2267
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
ER -