TY - JOUR
T1 - Privacy-Preserving Communication-Efficient Federated Multi-Armed Bandits
AU - Li, Tan
AU - Song, Linqi
PY - 2022/3
Y1 - 2022/3
N2 - Communication bottleneck and data privacy are two critical concerns in federated multi-armed bandit (MAB) problems, such as situations in decision-making and recommendations of connected vehicles via wireless. In this paper, we design the privacy-preserving communication-efficient algorithm in such problems and study the interactions among privacy, communication and learning performance in terms of the regret. To be specific, we design privacy-preserving learning algorithms and communication protocols and derive the learning regret when networked private agents are performing online bandit learning in a master-worker, a decentralized and a hybrid structure. Our bandit learning algorithms are based on epoch-wise sub-optimal arm eliminations at each agent and agents exchange learning knowledge with the server/each other at the end of each epoch. Furthermore, we adopt the differential privacy (DP) approach to protect the data privacy at each agent when exchanging information; and we curtail communication costs by making less frequent communications with fewer agents participation. By analyzing the regret of our proposed algorithmic framework in the master-worker, decentralized and hybrid structures, we theoretically show trade-offs between regret and communication costs/privacy. Finally, we empirically show these trade-offs which are consistent with our theoretical analysis.
AB - Communication bottleneck and data privacy are two critical concerns in federated multi-armed bandit (MAB) problems, such as situations in decision-making and recommendations of connected vehicles via wireless. In this paper, we design the privacy-preserving communication-efficient algorithm in such problems and study the interactions among privacy, communication and learning performance in terms of the regret. To be specific, we design privacy-preserving learning algorithms and communication protocols and derive the learning regret when networked private agents are performing online bandit learning in a master-worker, a decentralized and a hybrid structure. Our bandit learning algorithms are based on epoch-wise sub-optimal arm eliminations at each agent and agents exchange learning knowledge with the server/each other at the end of each epoch. Furthermore, we adopt the differential privacy (DP) approach to protect the data privacy at each agent when exchanging information; and we curtail communication costs by making less frequent communications with fewer agents participation. By analyzing the regret of our proposed algorithmic framework in the master-worker, decentralized and hybrid structures, we theoretically show trade-offs between regret and communication costs/privacy. Finally, we empirically show these trade-offs which are consistent with our theoretical analysis.
KW - communication efficient learning
KW - Costs
KW - Data models
KW - differential privacy
KW - Federated learning
KW - Knowledge engineering
KW - multi-armed bandit
KW - Privacy
KW - Protocols
KW - Servers
KW - Wireless communication
UR - http://www.scopus.com/inward/record.url?scp=85122888972&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85122888972&origin=recordpage
U2 - 10.1109/JSAC.2022.3142374
DO - 10.1109/JSAC.2022.3142374
M3 - RGC 21 - Publication in refereed journal
SN - 0733-8716
VL - 40
SP - 773
EP - 787
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
IS - 3
ER -