Mid-infrared spectroscopy (MIRS) is widely used to collect milk phenotypes at the population level. The aim of this study was to test the ability of the uninformative variable elimination (UVE) method to select and remove uninformative wavelength variables before partial least squares (PLS) analysis. Milk titratable acidity (TA) and Ca content were used as examples to illustrate the procedure. Reference values and MIRS spectra (n = 208) of TA and Ca were retrieved from an existing database. The data set was randomly divided into calibration (70% of data) and validation (30% of data) sets, and PLS analysis was carried out before and after the UVE procedure. The UVE procedure selected 244 and 113 informative wavelengths for TA and Ca, respectively, from a total of 1,060. The elimination of uninformative variables before PLS regression increased the accuracy of MIRS prediction models, and it substantially reduced the computation time. Dealing with fewer variables is expected to enhance the efficiency of MIRS models to predict phenotypes at population level.
Technical note: Improving the accuracy of mid-infrared prediction models by selecting the most informative wavelengths
GOTTARDO, PAOLO;DE MARCHI, MASSIMO;CASSANDRO, MARTINO;PENASA, MAURO
2015
Abstract
Mid-infrared spectroscopy (MIRS) is widely used to collect milk phenotypes at the population level. The aim of this study was to test the ability of the uninformative variable elimination (UVE) method to select and remove uninformative wavelength variables before partial least squares (PLS) analysis. Milk titratable acidity (TA) and Ca content were used as examples to illustrate the procedure. Reference values and MIRS spectra (n = 208) of TA and Ca were retrieved from an existing database. The data set was randomly divided into calibration (70% of data) and validation (30% of data) sets, and PLS analysis was carried out before and after the UVE procedure. The UVE procedure selected 244 and 113 informative wavelengths for TA and Ca, respectively, from a total of 1,060. The elimination of uninformative variables before PLS regression increased the accuracy of MIRS prediction models, and it substantially reduced the computation time. Dealing with fewer variables is expected to enhance the efficiency of MIRS models to predict phenotypes at population level.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.