ISSN 2070-7401 (Print), ISSN 2411-0280 (Online)
Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa
CURRENT PROBLEMS IN REMOTE SENSING OF THE EARTH FROM SPACE

  

Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa, 2025, V. 22, No. 4, pp. 64-75

Selection of important variables and best machine learning method for estimation of forest productivity using remote sensing data

S.A. Khvostikov 1 , S.A. Bartalev 1 , V.A. Egorov 1 , E.A. Stytsenko 1 
1 Space Research Institute RAS, Moscow, Russia
Accepted: 05.06.2025
DOI: 10.21046/2070-7401-2025-22-4-64-75
Recent advances in machine learning have resulted in abundance of methods to evaluate vegetation parameters using remote sensing data. The objective of this research was to find best machine learning method and a group of most informative variables to evaluate forest productivity using MODIS data. Big size of training sample (28 million pixels) and use of 669 variables based on remote sensing data necessitated selection of a subgroup of most informative variables. Applying multiple methods of variable importance evaluation led to a conclusion that methods based on data permutations showed best performance on available data, and they were used to select a subgroup of 100 informative remote sensing indicators with negligible loss of accuracy. This reduced subgroup of indicators allowed us to analyze multiple machine learning methods. Of all tested methods, gradient boosting based on LightGBM showed the best performance. A combination of LightGBM and the subgroup of best remote sensing indicators was used to create a productivity map of Russian forests. Analysis of its accuracy on a control sample that was not used to train the LightGBM model showed a coefficient of determination of 0.87 and RMSE of 0.5 classes of productivity. Comparison with field data showed a good match with remote sensing productivity estimates in more than 80 % of cases.
Keywords: forest, forest productivity, remote sensing, variable importance, gradient boosting
Full text

References:

  1. Bartalev S. A., Khvostikov S. A., FORS-MachLearn, Certificate of state registration of software No. 2023666251 (RU), Reg. 27.07.2023 (in Russian).
  2. Bartalev S. A., Egorov V. A., Zharko V. O., Loupian E. A., Plotnikov D. E., Khvostikov S. A., Shabanov N. V., Sputnikovoe katrographirovanie rastitel’nogo pokrova Rossii (Land cover mapping over Russia using Earth observation data), Moscow: IKI RAN, 2016, 208 p. (in Russian).
  3. Vorushilov I. I., Bartalev S. A., Egorov V. A., Evaluation of growing stock dynamic for disturbed territory of Russia, Materialy 20-i Mezhdunarodnoi konferentsii “Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa” (Proc. 20th Intern. Conf. “Current Problems in Remote Sensing of the Earth from Space”), Moscow: IKI RAS, 2022, p. 290 (in Russian), DOI: 10.21046/20DZZconf-2022a.
  4. Loupian E. A., Proshin A. A., Bourtsev M. A. et al., Experience of development and operation of the IKI-Monitoring center for collective use of systems for archiving, processing and analyzing satellite data, Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa, 2019, V. 16, No. 3, pp. 151–170 (in Russian), DOI: 10.21046/2070-7401-2019-16-3-151-170.
  5. Loupian E. A., Proshin A. A., Bourtsev M. A. et al., Vega-Science system: design features, main capabilities and usage experience, Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa, 2021, V. 18, No. 6, pp. 9–31 (in Russian), DOI: 10.21046/2070-7401-2019-16-3-151-170.
  6. Miklashevich T. S., Bartalev S. A., Plotnikov D. E., Interpolation algorithm for the recovery of long satellite data time series of vegetation cover observation, Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa, 2019, V. 16, No. 6, pp. 143–154 (in Russian), DOI: 10.21046/2070-7401-2019-16-6-143-154.
  7. Sochilova E. N., Surkov N. V., Ershov D. V. et al., Mapping of forest site index classes in Primorskiy Krai based on satellite images and terrain characteristics, Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa, 2018, V. 15, No. 5, pp. 96–109 (in Russian), DOI: 10.21046/2070-7401-2018-15-5-96-109.
  8. Khovrativich T. S., Indicators of horizontal forest structure and their remote assessment using optical satellite data (lecture), Materialy 20-i Mezhdunarodnoi konferentsii “Sovremennye problemy distantsionnogo zondirovaniya Zemli iz kosmosa” (Proc. 20th Intern. Conf. “Current Problems in Remote Sensing of the Earth from Space”), Moscow: IKI RAS, 2022, p. 505 (in Russian), DOI: 10.21046/20DZZconf-2022a.
  9. Shvidenko A. Z., Schepaschenko D. G., Nilsson S., Buluy Yu. I., Tablitsy i modeli khoda rosta i produktivnosti nasazhdenii osnovnykh lesoobrazuyushchikh porod Severnoi Evrazii (normativno-spravochnye materialy) (Tables and models of growth and productivity of forests of major forest forming species of northern Eurasia (standard and reference materials)), 2nd ed., Moscow: Federal’noe agentstvo lesnogo khozyaistva, 2008, 886 p. (in Russian).
  10. Belitz K., Stackelberg P. E., Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models, Environmental Modelling and Software, 2021, V. 139, Article 105006, DOI: 10.1016/j.envsoft.2021.105006.
  11. Bjelanovic I., Comeau P., White B., High resolution site index prediction in boreal forests using topographic and wet areas mapping attributes, Forests, 2018, V. 9, No. 3, Article 113, DOI: 10.3390/f9030113.
  12. Chen T., Guestrin C., XGBoost: A scalable tree boosting system, Proc. 22nd ACM SIGKDD Intern. Conf. Knowledge Discovery and Data Mining, San Francisco, California, USA: ACM, 2016, pp. 785–794, DOI: 10.1145/2939672.293978.
  13. Erickson N., Mueller J., Shirkov A. et al., Autogluon-tabular: Robust and accurate AutoML for structured data, https://arxiv.org/, arXiv:2003.06505, 2020, 28 p., DOI: 10.48550/arXiv.2003.06505.
  14. Hansen M. C., Potapov P. V., Moore R. et al., High-resolution global maps of 21st-century forest cover change, Science, 2013, V. 342, No. 6160, pp. 850–853, DOI: 10.1126/science.1244693.
  15. Hooker G., Mentch L., Zhou S., Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance, Statistics and Computing, 2021, V. 31, Article 82, 16 p., DOI: 10.1007/s11222-021-10057-z.
  16. Huuva I., Wallerman J., Fransson J. E. S., Persson H. J., Prediction of site index and age using time series of TanDEM-X phase heights, Remote Sensing, 2023, V. 15, No. 17, Article 4195, DOI: 10.3390/rs15174195.
  17. Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y., LightGBM: A highly efficient gradient boosting decision tree, Advances in neural information processing systems 30: 31th Conf. Neural Information Processing Systems (NIPS 2017), 2017, V. 30, pp. 3149–3157.
  18. Khatami R., Mountrakis G., Stehman S. V., A meta-analysis of remote sensing research on supervised pixel-based land-cover image classification processes: General guidelines for practitioners and future research, Remote Sensing of Environment, 2016, V. 177, pp. 89–100, DOI: 10.1016/j.rse.2016.02.028.
  19. LeDell E., Poirier S., H2O AutoML: Scalable automatic machine learning, Proc. AutoML Workshop at ICML, ICML, 2020, 16 p.
  20. Maxwell A. E., Warner T. A., Fang F., Implementation of machine-learning classification in remote sensing: an applied review, Intern. J. Remote Sensing, 2018, V. 39, No. 9, pp. 2784–2817, DOI: 10.1080/01431161.2018.1433343.
  21. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É., Scikit-learn: Machine learning in Python, J. Machine Learning Research, 2011, V. 12, pp. 2825–2830.
  22. Penner M., Woods M., Bilyk A., Assessing site productivity via remote sensing — age-independent site index estimation in even-aged forests, Forests, 2023, V. 14, No. 8, Article 1541, DOI: 10.3390/f14081541.
  23. Sheykhmousa M., Mahdianpari M., Ghanbari H. et al., Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review, IEEE J. Selected Topics in Applied Earth Observations and Remote Sensing, 2020, V. 13, pp. 6308–6325, DOI: 10.1109/JSTARS.2020.3026724.
  24. Strobl C., Boulesteix A. L., Zeileis A., Hothorn T., Bias in random forest variable importance measures: Illustrations, sources and a solution, BMC Bioinformatics, 2007, V. 8, No. 1, Article 25, DOI: 10.1186/1471-2105-8-25.
  25. Tachikawa T., Hato M., Kaku M., Iwasaki A., Characteristics of ASTER GDEM version 2, 2011 IEEE Intern. Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada: IEEE, 2011, pp. 3657–3660, DOI: 10.1109/IGARSS.2011.6050017.
  26. Wang C., Wu Q., Weimer M., Zhu E., FLAML: A fast and lightweight AutoML library, Proc. Machine Learning and Systems 3 (MLSys 2021), 2021, V. 3, pp. 434–447.
  27. Wright M. N., Ziegler A., ranger: A fast implementation of random forests for high dimensional data in C++ and R, J. Statistical Software, 2017, V. 77, Iss. 1, pp. 1–17, DOI: 10.18637/jss.v077.i01.