On the selection of the weighting parameter value in optimizing Eucalyptus globulus pulp yield models based on NIR spectra


Prediction of pulp yield of Eucalyptus globulus wood samples based on partial least squares (PLS) regression can be optimized by utilizing specific near infrared (NIR) wavelengths. A critical feature of this approach is the weighting of constraint conditions. Equal weighting balances optimization in terms of calibration and prediction; however, there is a lack of knowledge regarding prediction performance of wood property models when different weight factors are used. In this study, pulp yield models were developed using two E. globulus data sets characterized by narrow (5%) and extreme (22.6%) yield ranges and represented by untreated and second derivative NIR spectra. The global optimization solver pySOT was used to optimize the performance of a PLS regression model in terms of wavelengths selected and number of latent variables. A linear function of R-squares for calibration ($R_c^2$) and prediction ($R_p^2$) sets was utilized as the objective function with the aim of maximizing $\alpha R_c^2 + (1-\alpha)R_p^2$ for all values of $\alpha$ between 0 (maximizing $R_p^2$ without concern for $R_c^2$) and 1 (only maximizing $R_c^2$). Values of $\alpha \leq 0.8$ provided good predictive performance, whereas $\alpha \geq 0.9$ tended to overfit the calibration data indicating that models are robust for values of $\alpha$ from 0 to 0.8. Representative wavelengths for each data set were identified and assigned to corresponding wood components through a band assignment process. Strong agreement was observed for $\alpha \leq 0.8$; however, for $\alpha \geq 0.9$, identified wavelengths generally occurred in regions unrelated to vibrations arising from specific wood components.

Wood Science and Technology