The evaluation of accuracy is essential for assuring the reliability of ecological models. Usually, the accuracy of above-ground biomass (AGB) predictions obtained from remote sensing is assessed by the mean differences (MD), the root mean squared differences (RMSD), and the coefficient of determination (R2) between observed and predicted values. In this article we propose a more thorough analysis of accuracy, including a hypothesis test to evaluate the agreement between observed and predicted values, and an assessment of the degree of overfitting to the sample employed for model training. Using the estimation of forest AGB from LIDAR and spectral sensors as a case study, we compared alternative prediction and variable selection methods using several statistical measures to evaluate their accuracy. We showed that the hypothesis tests provide an objective method to infer the statistical significance of agreement. We also observed that overfitting can be assessed by comparing the inflation in residual sums of squares experienced when carrying out a cross-validation. Our results suggest that this method may be more effective than analysing the deflation in R2. We proved that overfitting needs to be specifically addressed since, in light of MD, RMSD and R2 alone, predictions may apparently seem reliable even in clearly unrealistic circumstances, for instance when including too many predictor variables. Moreover, Theil’s partial inequality coefficients, which are employed to resolve the proportions of the total errors due to the unexplained variance, the slope and the bias, may become useful to detect averaging effects common in remote sensing predictions of AGB. We concluded that statistical measures of accuracy, precision and agreement are necessary but insufficient for model evaluation. We therefore advocate for incorporating evaluation measures specifically devoted to testing observed-versus-predicted fit, and to assessing the degree of overfitting.