Skip to main navigation Skip to search Skip to main content

Distributed Online Bandit Learning in Dynamic Environments over Unbalanced Digraphs

Jueyou Li, Chaojie Li*, Wenwu Yu, Xiaomei Zhu, Xinghuo Yu

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

This work is concerned with distributed online bandit learning over the multi-agent network, where a group of agents aims to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by an individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sub-linear expected dynamic regret attained by the algorithm for convex and strongly convex function, respectively, when the accumulative deviation of the minimizer sequence increases sub-linearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(√T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.
Original languageEnglish
Pages (from-to)3034-3047
JournalIEEE Transactions on Network Science and Engineering
Volume8
Issue number4
Online published30 Jun 2021
DOIs
Publication statusPublished - Oct 2021
Externally publishedYes

Research Keywords

  • Communication networks
  • Convex functions
  • Distributed optimization
  • Heuristic algorithms
  • Mirror descent
  • Mirrors
  • Multi-agent network
  • Online learning
  • Optimization
  • Protocols
  • Signal processing algorithms
  • Unbalanced digraph

Fingerprint

Dive into the research topics of 'Distributed Online Bandit Learning in Dynamic Environments over Unbalanced Digraphs'. Together they form a unique fingerprint.

Cite this