Joint Communication and Inference User Allocation in LLM Native Networks

Picano, B.; Buratto, A.; Badia, L.

doi:10.1109/ICMLCN64995.2025.11139994

Large language models (LLMs) are game changers for future next-generation networks, unlocking new opportunities for disruptive and interactive services and applications. Edge computing enables deployment of LLMs closer to the users, allowing for the implementation of highly responsive intelligent systems. This paper proposes a matching theory-based algorithm to optimize the user-LLM association and considers both the communication and inference delay, in the presence of capacity-constrained edge nodes. The objective is to minimize end-to-end user delay, that is, the time elapsed between when a user submits a request and when the response is sent back. Therefore, a matching game is formulated between the users and the LLMs, assuming heterogeneous LLMs, specialized in different types of learning tasks. The scenario is modeled as a matching game with externalities and incomplete lists, which terminates in a stable configuration, leveraging monotonic user preference list metric, within the algorithm execution. A comparative performance evaluation against different state-of-the-art techniques confirms the advantages of adopting a joint communication and inference aware approach to orchestrate the user-LLM assignments.