Large language models (LLMs) are game changers for future next-generation networks, unlocking new opportunities for disruptive and interactive services and applications. Edge computing enables deployment of LLMs closer to the users, allowing for the implementation of highly responsive intelligent systems. This paper proposes a matching theory-based algorithm to optimize the user-LLM association and considers both the communication and inference delay, in the presence of capacity-constrained edge nodes. The objective is to minimize end-to-end user delay, that is, the time elapsed between when a user submits a request and when the response is sent back. Therefore, a matching game is formulated between the users and the LLMs, assuming heterogeneous LLMs, specialized in different types of learning tasks. The scenario is modeled as a matching game with externalities and incomplete lists, which terminates in a stable configuration, leveraging monotonic user preference list metric, within the algorithm execution. A comparative performance evaluation against different state-of-the-art techniques confirms the advantages of adopting a joint communication and inference aware approach to orchestrate the user-LLM assignments.

Joint Communication and Inference User Allocation in LLM Native Networks

Buratto A.;Badia L.
2025

Abstract

Large language models (LLMs) are game changers for future next-generation networks, unlocking new opportunities for disruptive and interactive services and applications. Edge computing enables deployment of LLMs closer to the users, allowing for the implementation of highly responsive intelligent systems. This paper proposes a matching theory-based algorithm to optimize the user-LLM association and considers both the communication and inference delay, in the presence of capacity-constrained edge nodes. The objective is to minimize end-to-end user delay, that is, the time elapsed between when a user submits a request and when the response is sent back. Therefore, a matching game is formulated between the users and the LLMs, assuming heterogeneous LLMs, specialized in different types of learning tasks. The scenario is modeled as a matching game with externalities and incomplete lists, which terminates in a stable configuration, leveraging monotonic user preference list metric, within the algorithm execution. A comparative performance evaluation against different state-of-the-art techniques confirms the advantages of adopting a joint communication and inference aware approach to orchestrate the user-LLM assignments.
2025
2025 IEEE International Conference on Machine Learning for Communication and Networking, ICMLCN 2025
2nd IEEE International Conference on Machine Learning for Communication and Networking, ICMLCN 2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3562684
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact