PLS for classification

Stocchero, M.; De Nardi, M.; Scarpa, B.

doi:10.1016/j.chemolab.2021.104374

Partial Least Squares regression (PLS) is a multivariate technique developed to perform regression in the case of multivariate responses when multicollinearity, redundancy and noise affect the predictors. In spite of several efforts have been made to extend PLS to classification problems, this is still a current field of research. In the present study, a new technique called PLS for classification is introduced to solve the general G-class problem. It is developed within a self-consistent framework based on linear algebra and on the theory of compositional data. After the introduction of the notion of probability-data vector, the space of the predictors and that of the conditional probabilities are linked, and a well-defined least squares problem, whose solution specifies the relationship between probabilities and predictors, is solved by a suitable reformulation of PLS2. The method estimates directly the conditional probability of the class membership given the predictors. The score vectors are introduced only in a second step to improve model interpretation. The main properties of PLS for classification and its relationships with PLS-DA are discussed. One simulated and one real data sets are investigated to show how the method works in practice.