Learning interpretable representations for classification, anomaly detection, human gesture and action recognition

Terzi, Matteo

The goal of this thesis is to provide algorithms and models for classification, gesture recognition and anomaly detection with a partial focus on human activity. In applications where humans are involved, it is of paramount importance to provide robust and understandable algorithms and models. A way to accomplish this requirement is to use relatively simple and robust approaches, especially when devices are resource-constrained. The second approach, when a large amount of data is present, is to adopt complex algorithms and models and make them robust and interpretable from a human-like point of view. This motivates our thesis that is divided in two parts. The first part of this thesis is devoted to the development of parsimonious algorithms for action/gesture recognition in human-centric applications such as sports and anomaly detection for artificial pancreas. The data sources employed for the validation of our approaches consist of a collection of time-series data coming from sensors, such as accelerometers or glycemic. The main challenge in this context is to discard (i.e. being invariant to) many nuisance factors that make the recognition task difficult, especially where many different users are involved. Moreover, in some cases, data cannot be easily labelled, making supervised approaches not viable. Thus, we present the mathematical tools and the background with a focus to the recognition problems and then we derive novel methods for: (i) gesture/action recognition using sparse representations for a sport application; (ii) gesture/action recognition using a symbolic representations and its extension to the multivariate case; (iii) model-free and unsupervised anomaly detection for detecting faults on artificial pancreas. These algorithms are well-suited to be deployed in resource constrained devices, such as wearables. In the second part, we investigate the feasibility of deep learning frameworks where human interpretation is crucial. Standard deep learning models are not robust and, unfortunately, literature approaches that ensure robustness are typically detrimental to accuracy in general. However, in general, real-world applications often require a minimum amount of accuracy to be employed. In view of this, after reviewing some results present in the recent literature, we formulate a new algorithm being able to semantically trade-off between accuracy and robustness, where a cost-sensitive classification problem is provided and a given threshold of accuracy is required. In addition, we provide a link between robustness to input perturbations and interpretability guided by a physical minimum energy principle: in fact, leveraging optimal transport tools, we show that robust training is connected to the optimal transport problem. Thanks to these theoretical insights we develop a new algorithm that provides robust, interpretable and more transferable representations.

Learning interpretable representations for classification, anomaly detection, human gesture and action recognition / Terzi, Matteo. - (2019 Dec 01).