Near-infrared (NIR) spectroscopy is an analytical technique used to determine chemical and physical features of a sample. The sample is illuminated with near-infrared light and its properties, such as absorbance or reflectance, are measured at different wavelengths within the near-infrared region of the electromagnetic spectrum. A calibration model is then adopted to use information from the obtained spectral data to predict the chemical or physical feature of interest. Given that hundreds of wavelengths are commonly taken into consideration, it is fundamentally important to be able to distinguish between informative wavelengths and those providing only irrelevant or redundant information. Each wavelength corresponds to an independent variable to be included in the calibration model, so we are interested in identifying an appropriate feature selection approach. Rather than considering the commonly-used filter, wrapper or embedded methods, such as the Chi-squared test, Lasso regression or step-wise selection, in this paper we focus on a different family of feature selection techniques, namely interval selection methods. These methods are often used to select groups of consecutive wavelengths in the field of NIR spectroscopy due to the continuous nature of spectral data. As such it makes more sense for practitioners to select small informative regions of spectral points rather than a single point. In this paper, we propose a new interval selection technique called Permutation and Lasso-based Interval Selection (PLIS), based on the adoption of Lasso Regression and permutation tests. The performance of this solution is then evaluated by means of a simulation study and a toy example coming from a real industrial problem.
A new permutation and Lasso-based interval selection technique
Rosa Arboretti;Riccardo Ceccato;Luca Pegoraro;Luigi Salmaso
2020
Abstract
Near-infrared (NIR) spectroscopy is an analytical technique used to determine chemical and physical features of a sample. The sample is illuminated with near-infrared light and its properties, such as absorbance or reflectance, are measured at different wavelengths within the near-infrared region of the electromagnetic spectrum. A calibration model is then adopted to use information from the obtained spectral data to predict the chemical or physical feature of interest. Given that hundreds of wavelengths are commonly taken into consideration, it is fundamentally important to be able to distinguish between informative wavelengths and those providing only irrelevant or redundant information. Each wavelength corresponds to an independent variable to be included in the calibration model, so we are interested in identifying an appropriate feature selection approach. Rather than considering the commonly-used filter, wrapper or embedded methods, such as the Chi-squared test, Lasso regression or step-wise selection, in this paper we focus on a different family of feature selection techniques, namely interval selection methods. These methods are often used to select groups of consecutive wavelengths in the field of NIR spectroscopy due to the continuous nature of spectral data. As such it makes more sense for practitioners to select small informative regions of spectral points rather than a single point. In this paper, we propose a new interval selection technique called Permutation and Lasso-based Interval Selection (PLIS), based on the adoption of Lasso Regression and permutation tests. The performance of this solution is then evaluated by means of a simulation study and a toy example coming from a real industrial problem.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.