Imputation of incomplete large‐scale monitoring count data via penalized estimation

Dakki, M., Robin, G., Suet, M., Qninba, A., El Agbani, M.A., Ouassou, A., El Hamoumi, R., Azafzaf, H., Rebah, S., Feltrup‐Azafzaf, C., Hamouda, N., Ibrahim, W.A.L., Asran, H.H., Elhady, A.A., Ibrahim, H., Etayeb, K., Bouras, E., Saied, A., Glidan, A., Habib, B.M., et al. (2021). Imputation of incomplete large‐scale monitoring count data via penalized estimation. Methods in Ecology and Evolution 12 1031-1039. 10.1111/2041-210X.13594.

[thumbnail of Manuscript_reviewed.pdf]
Preview
Text
Manuscript_reviewed.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.

Download (3MB) | Preview

Abstract

In biodiversity monitoring, large datasets are becoming more and more widely available and are increasingly used globally to estimate species trends and conservation status. These large-scale datasets challenge existing statistical analysis methods, many of which are not adapted to their size, incompleteness and heterogeneity. The development of scalable methods to impute missing data in incomplete large-scale monitoring datasets is crucial to balance sampling in time or space and thus better inform conservation policies. We developed a new method based on penalized Poisson models to impute and analyse incomplete monitoring data in a large-scale framework. The method allows parameterization of (a) space and time factors, (b) the main effects of predictor covariates, as well as (c) space–time interactions. It also benefits from robust statistical and computational capability in large-scale settings. The method was tested extensively on both simulated and real-life waterbird data, with the findings revealing that it outperforms six existing methods in terms of missing data imputation errors. Applying the method to 16 waterbird species, we estimated their long-term trends for the first time at the entire North African scale, a region where monitoring data suffer from many gaps in space and time series. This new approach opens promising perspectives to increase the accuracy of species-abundance trend estimations. We made it freely available in the r package ‘lori’ (https://CRAN.R-project.org/package=lori) and recommend its use for large-scale count data, particularly in citizen science monitoring programmes.

Item Type: Article
Uncontrolled Keywords: biodiversity monitoring; high-dimensional statistics; incomplete count data; missing data imputation; penalized estimation; waterbird trends in North Africa
Research Programs: Biodiversity and Natural Resources (BNR)
Biodiversity and Natural Resources (BNR) > Biodiversity, Ecology, and Conservation (BEC)
Depositing User: Luke Kirwan
Date Deposited: 12 Apr 2021 07:12
Last Modified: 22 Mar 2022 03:00
URI: https://pure.iiasa.ac.at/17162

Actions (login required)

View Item View Item