The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client n sends a sample from a client-only probability distribution qϕ(n), and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a global distribution pθ that is close to the clients' distribution qϕ(n) in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions qϕ(n)'s and the side information pθ at the server, and propose a framework that requires approximately DKL(qϕ(n)||pθ) bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks to attain the same (and often higher) test accuracy with up to 82 times smaller bitrate than the prior work -- corresponding to 2,650 times overall compression.

Adaptive Compression in Federated Learning via Side Information

Pase F.;Zorzi M.
2024

Abstract

The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client n sends a sample from a client-only probability distribution qϕ(n), and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a global distribution pθ that is close to the clients' distribution qϕ(n) in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions qϕ(n)'s and the side information pθ at the server, and propose a framework that requires approximately DKL(qϕ(n)||pθ) bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks to attain the same (and often higher) test accuracy with up to 82 times smaller bitrate than the prior work -- corresponding to 2,650 times overall compression.
2024
PROCEEDINGS OF the International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3531993
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact