A prototypal goal-oriented communication system consists of a transmitter agent that observes the state of a process and communicates it to a receiver agent that takes actions based on the information received to affect the behavior of the process itself. If the process is Markovian, this system can be modeled as a remote Partially Observable Markov Decision Process (POMDP). Despite the simplicity of the scenario, the complexity of finding optimal transmission and control policies grows exponentially with the size of the state space, calling for heuristic solutions. This is due to the interplay of the agents' strategies, which need to consider the implicit information acquired by the receiver when the transmitter stays silent. In this paper, we propose a new way to optimize remote POMDPs, posing them as single-agent problems by introducing policy-level constraints. The proposed formulation allows us to adopt standard optimization tools for quadratically-constrained linear programming, significantly reducing the computational complexity of the task. In particular, we show that a Disciplined Convex-Concave Programming (DCCP) approach can perform near optimally while maintaining a manageable computational burden in a scenario in which policy iteration-based heuristics have a wide optimality gap.
Solving the Goal-Oriented Communication Problem through Policy-Level Constrained Programming
Chiariotti F.
;Mason F.;Zanella A.
2025
Abstract
A prototypal goal-oriented communication system consists of a transmitter agent that observes the state of a process and communicates it to a receiver agent that takes actions based on the information received to affect the behavior of the process itself. If the process is Markovian, this system can be modeled as a remote Partially Observable Markov Decision Process (POMDP). Despite the simplicity of the scenario, the complexity of finding optimal transmission and control policies grows exponentially with the size of the state space, calling for heuristic solutions. This is due to the interplay of the agents' strategies, which need to consider the implicit information acquired by the receiver when the transmitter stays silent. In this paper, we propose a new way to optimize remote POMDPs, posing them as single-agent problems by introducing policy-level constraints. The proposed formulation allows us to adopt standard optimization tools for quadratically-constrained linear programming, significantly reducing the computational complexity of the task. In particular, we show that a Disciplined Convex-Concave Programming (DCCP) approach can perform near optimally while maintaining a manageable computational burden in a scenario in which policy iteration-based heuristics have a wide optimality gap.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




