Reinforcement Learning is nowadays one of the most active research areas in Artificial Intelligence. This is due both to its practical successes, and to the theoretical challenges it poses, lying in between Control Theory and Machine Learning. Although the paradigm is very simple, and it consists of an agent which interact with an environment to collect reward, it paved the way for very diverse approaches. These are distinguished in two main categories: model-based, which relies on predictions given by a model of the environment, and model-free, which tackles directly the policy search problem; both with merits and defects. Current research is trying to merge these perspective, yielding an algorithm which inherits benefits of both. In parallel, and until now practically independently, recent years have seen a progressive increase of literature on the Koopman operator framework. The latter allows for a linear description of any nonlinear dynamical system, although relying on a space of function, which however is infinite-dimensional. The advantages of the linear description have led to a great diffusion of this perspective: first for the analysis of nonlinear flows, and very recently also for the control perspective. Clearly the benefits of the linear characterisation are balanced by the drawback of having to handle an infinite-dimensional description of the system. The latter is not feasible in practice, therefore different finite-dimensional approximation techniques emerged form the literature, also to learn the Koopman operator from data. However these are based on the a priori definition of a finite-dimensional space, in which the operator will be learnt, so that they are unsuccessful without significant prior knowledge. In this Dissertation a new approach to estimate the Koopman operator is presented, relying on Reproducing Kernel Hilbert Spaces theory. The latter are indeed Hilbert spaces, therefore infinite-dimensional, however they yield a finite-dimensional solution of the reconstruction problem, thanks to the Representer theorem. The connections between Koopman operator framework and Reproducing Kernel Hilbert Spaces are widely discussed, from the perspective of operator theory. In particular, by framing the learning problem in RKHS, it is possible to relax the assumption of a fixed finite-dimensional space for the estimation, and the only prior knowledge embedded in the problem is done through the kernel, which shapes the functions yielding the estimate. The latter approach, resulting from the combination of the Koopman operator framework and RKHS, turns out to be particularly suitable for dealing with the Reinforcement Learning problem. In this work it is shown how a natural formulation of the value-function, which is the key quantity of the Reinforcement Learning setting, can be given by subsequent iterations of the Koopman operator. As the reward is all that matters, the propagation of this particular function through the Koopman operator gives a concise way of addressing the Reinforcement Learning problem. The latter indeed provides a model which is not based on the propagation of the state, but relies only on the reward: therefore it can be regarded as an approach lying midway between a model-free and a model-based perspective.
Il Reinforcement Learning è ad oggi una delle più attive aree di ricerca dell'Intelligenza Artificiale. Ciò è dovuto sia ai suoi successi applicativi, sia alle diverse sfide teoriche che presenta, trovandosi all'intersezione fra Teoria del Controllo e Machine Learning. Sebbene il paradigma sia piuttosto semplice, il quale consiste in un agente che interagisce con l'ambiente per ottenere reward, ha aperto la strada ad approcci molto diversi. Questi sono distinti in due grandi categorie: model-based, che si basa su previsioni date da un modello dell'ambiente, e model-free, che affronta direttamente il problema della ricerca della policy; entrambi con pregi e difetti. La ricerca attuale sta cercando di unire queste filosofie, ottenendo un algoritmo che eredita i vantaggi di entrambe. Parallelamente, e finora in modo essenzialmente indipendente, negli ultimi anni si è assistito a un progressivo aumento della letteratura sul Koopman operator. Quest'ultimo consente una descrizione lineare di un qualsiasi sistema dinamico non lineare, basandosi però su uno spazio di funzioni, che risulta essere infinitamente dimensionale. I vantaggi della descrizione lineare hanno portato a una grande diffusione di questa strategia: prima per l'analisi dei flussi non lineari, e recentemente anche per il problema del controllo. Chiaramente i vantaggi della caratterizzazione lineare sono bilanciati dallo svantaggio di dover gestire una descrizione infinito-dimensionale del sistema. Quest'ultima non è in pratica esprimibile, quindi in letteratura sono emerse diverse tecniche di approssimazione finito-dimensionali, anche per stimare l'operatore di Koopman da osservazioni della dinamica. Tuttavia, queste si basano sulla definizione a priori di uno spazio a dimensione finita, in cui l'operatore viene approssimato, cosa che rende estremamente vincolante l'utilizzo di queste tecniche senza informazione a priori sul sistema. In questa Tesi viene presentato un nuovo approccio per la stima del Koopman operator, basato sulla teoria dei Reproducing Kernel Hilbert Spaces. Quest'ultimi sono effettivamente spazi di Hilbert, quindi infinito-dimensionali, che però inducono una soluzione finito-dimensionale del problema di ricostruzione, grazie al Representer theorem. La relazione tra il Koopman operator e i Reproducing Kernel Hilbert Spaces è ampiamente discussa dal punto di vista della teoria degli operatori. In particolare, l'adozione degli RKHS permette di rilassare l'ipotesi di uno spazio finito-dimensionale fissato a priori per la stima, in quanto l'unica informazione a priori nel problema di stima è rappresentata dalla scelta del kernel, che caratterizzerà poi le funzioni di base di cui la ricostruzione è una combinazione lineare. Questo particolare approccio, dato dall'unione del Koopman operator e degli RKHS, risulta essere particolarmente conveniente per affrontare il problema proposto dal Reinforcement Learning. Infatti si dimostra in questo lavoro come una formulazione naturale della value-function, che è la quantità chiave nel framework del Reinforcement Learning, possa essere data tramite iterazioni successive del Koopman operator. Essendo il reward l'unica cosa che conta, la propagazione di questa particolare funzione attraverso l'operatore di Koopman fornisce un modo efficace di affrontare il problema del Reinforcement Learning. Quest'ultimo configura infatti un modello che non si basa sulla propagazione dello stato, ma solamente sul reward: può quindi essere considerato un approccio a metà strada tra una prospettiva model-free e una model-based.
Reinforcement Learning Through the Lens of Koopman Operators - The Infinite-Dimensional Framework / Zanini, Francesco. - (2023 Mar 06).
Reinforcement Learning Through the Lens of Koopman Operators - The Infinite-Dimensional Framework
ZANINI, FRANCESCO
2023
Abstract
Reinforcement Learning is nowadays one of the most active research areas in Artificial Intelligence. This is due both to its practical successes, and to the theoretical challenges it poses, lying in between Control Theory and Machine Learning. Although the paradigm is very simple, and it consists of an agent which interact with an environment to collect reward, it paved the way for very diverse approaches. These are distinguished in two main categories: model-based, which relies on predictions given by a model of the environment, and model-free, which tackles directly the policy search problem; both with merits and defects. Current research is trying to merge these perspective, yielding an algorithm which inherits benefits of both. In parallel, and until now practically independently, recent years have seen a progressive increase of literature on the Koopman operator framework. The latter allows for a linear description of any nonlinear dynamical system, although relying on a space of function, which however is infinite-dimensional. The advantages of the linear description have led to a great diffusion of this perspective: first for the analysis of nonlinear flows, and very recently also for the control perspective. Clearly the benefits of the linear characterisation are balanced by the drawback of having to handle an infinite-dimensional description of the system. The latter is not feasible in practice, therefore different finite-dimensional approximation techniques emerged form the literature, also to learn the Koopman operator from data. However these are based on the a priori definition of a finite-dimensional space, in which the operator will be learnt, so that they are unsuccessful without significant prior knowledge. In this Dissertation a new approach to estimate the Koopman operator is presented, relying on Reproducing Kernel Hilbert Spaces theory. The latter are indeed Hilbert spaces, therefore infinite-dimensional, however they yield a finite-dimensional solution of the reconstruction problem, thanks to the Representer theorem. The connections between Koopman operator framework and Reproducing Kernel Hilbert Spaces are widely discussed, from the perspective of operator theory. In particular, by framing the learning problem in RKHS, it is possible to relax the assumption of a fixed finite-dimensional space for the estimation, and the only prior knowledge embedded in the problem is done through the kernel, which shapes the functions yielding the estimate. The latter approach, resulting from the combination of the Koopman operator framework and RKHS, turns out to be particularly suitable for dealing with the Reinforcement Learning problem. In this work it is shown how a natural formulation of the value-function, which is the key quantity of the Reinforcement Learning setting, can be given by subsequent iterations of the Koopman operator. As the reward is all that matters, the propagation of this particular function through the Koopman operator gives a concise way of addressing the Reinforcement Learning problem. The latter indeed provides a model which is not based on the propagation of the state, but relies only on the reward: therefore it can be regarded as an approach lying midway between a model-free and a model-based perspective.File | Dimensione | Formato | |
---|---|---|---|
tesi_Francesco_Zanini.pdf
Open Access dal 05/09/2024
Descrizione: tesi_Francesco_Zanini
Tipologia:
Tesi di dottorato
Licenza:
Altro
Dimensione
5.87 MB
Formato
Adobe PDF
|
5.87 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.