An attempt to identify milk protein fraction genotypes using unsupervised and supervised near-infrared spectroscopy methods

Navarro, N. S.; Albanell, E.; De Marchi, M.; Manuelian, C. L.

doi:10.1080/1828051X.2024.2314157

The aim was to evaluate near-infrared spectroscopy (NIRS) potential to discriminate among β-casein (CN), κ-CN and β-lactoglobulin (LG) genotypes to be used as an authentication method. A total of 168 milk samples with known genetic information for β-CN, κ-CN and β-LG were collected at the same farm and paired with the NIRS spectrum. Spectra were evaluated with an unsupervised method (principal component analysis, PCA) and a supervised method (partial least squares-discriminant analysis, PLS-DA). For the PLS-DA, data were split into a train (75%) and a test set (25%), and the variable in projection >1 criterion was applied to select informative wavelengths. Results obtained confirmed that milk quality was similar among genetic variants. For the PCA, the observed variance explained by the first two principal components was 94%, but samples were not clustered by their genotypes of β-CN (i.e. A1A2, A2A2), κ-CN (i.e. AA, AB, AE, BB, BE) and β-LG (i.e. AA, AB, BB). The best accuracy for the PLS-DA models was reached by β-CN (train and test set, 64%), followed by β-LG (train set, 56%; test set, 52%) and κ-CN (train set, 41%; test set, 36%). In conclusion, the PCA on milk spectra was not able to cluster β-CN, κ-CN and β-LG genotypes, but the PLS-DA models revealed promising results for β-CN and β-LG. It could be interesting to increase the number of samples to equilibrate genetic variants and to apply a sampling selection method before discarding the applicability of NIRS as an authentication method.