A pantropical assessment of deforestation caused by industrial mining

Significance Driven by rapidly increasing demand for mineral resources, both industrial mining and artisanal mining are intensifying across the tropical biome. A number of regional studies have analyzed mining-induced deforestation, but scope and patterns across all tropical countries have not yet been investigated. Focusing on industrial mining, we use geospatial data to quantify direct forest loss within mining sites in 26 countries. We also perform a statistical assessment to test whether industrial mining drives indirect deforestation in the mine surroundings. We show that direct deforestation concentrates only in a few countries, while industrial mining causes indirect deforestation in two-thirds of tropical countries. In order to preserve tropical forests, direct and indirect deforestation impacts of mining projects should be fully considered.

was also calculated. The protected areas vector data from UNEP-WCMC (16) was converted into a 30 arcsec binary grid, from which we calculated the distances from each cell to the closest protected cell. We did not distinguish between different types of protected areas.
Other land-cover variables for the year 2000 were resampled from the 300 m Climate Change Initiative Land Cover maps (17) to our grid using the major class present in each cell. From this layer we also calculated the distance to agriculture as the distances from the centroid of each cell to the closest agriculture cell. Note that we were not able to include a variable on "distance to logging" due to a lack of data across all investigated countries.
All code to process the geospatial data and perform the statistical assessment is available on GitHub. The data preparation repository can be found at https://github.com/fineprint-global/mining_deforestation-data-preparation and the repository containing the statistical modelling can be found at https://github.com/fineprint-global/mining_deforestation-stat.

Statistical framework
We investigated indirect effects of mining on forest loss via regression analysis. Our grid cell data were pruned using matching, in order to limit model dependence and emulate a fully-blocked experimental setting (23). The model specification was aimed at controlling for the confounding effects discussed above, in order to capture causal effects of mining on forest loss along our hypothesised pathway (24).
Identifying the drivers of deforestation. To integrate all relevant effects into our statistical model, we first developed a conceptual framework of potential causal pathways leading to tropical forest loss (Supplementary Figure 2). Drivers of deforestation were derived from (meta) studies on deforestation that identified the key determinants of tropical forest loss (25)(26)(27)(28). Note that our framework also considers factors that lower deforestation, such as steep slopes, high elevations and protected areas. For our statistical model, we selected 'proximity to mine' as the treatment variable, 'forest cover loss' as the dependent variable and a set of additional control variables, all available on a global scale and at a resolution of 1 by 1 km (30 arc seconds,  see Supplementary Table 1 for a summary). Indirect forest loss driven by mineral extraction, modelled through the proximity of each grid cell to the nearest mine, is of our central interest. In order to specify causal effects of mining on deforestation, while taking into account a range of other factors, we selected eleven control variables. As production data per mining polygon was not available, the direct land use of mining, i.e. the areal extent of all polygons within one grid cell, was used as a proxy variable for mining intensity, assuming that deforestation would not only depend on the distance to a mine, but also on the size of the mining project. Areal extent served as a suitable proxy, because production and area of mines were shown to strongly correlate (29). We included three variables controlling for the distribution in the initial year 2000: forest cover to control for forest distribution, land use to reflect the initial distribution between different types of land (cropland, forests, etc.) and population density as a proxy for urban development and economic activity. We controlled for agricultural activity by including a variable on the distance of each grid cell to the nearest agricultural area. Further, we considered the effects of proximity to access infrastructure and transport modes by including a variable of proximity to waterways (i.e. rivers) and another on proximity to major roads such as highways. We assume major roads to be determined exogenously from mining sites, while we regard smaller types of roads as one of the mediators between mining activities and deforestation. We also included a control variable of proximity to protected areas as an environmental policy factor. Finally, we considered three variables of biophysical characteristics of each grid cell: soil type, slope and elevation.
Coarsened exact matching. For our analysis, we divided the global data set into subsets of single countries instead of pooling them. Countries were thus addressed individually and all relevant factors may have country-specific impacts. This is important because dynamics may differ considerably across countries and national circumstances may impact the effects of, e.g., agriculture or protected areas on forest loss. Within a given country, only a subset of locations was relevant to our analysis of forest loss and mining with imbalances across covariates. To limit the degree of imbalance and model dependence, we used a coarsened exact matching (CEM) approach (23,30). CEM emulates a fully blocked randomised experiment instead of the more inefficient and imbalance-inducing fully randomised experiment applied in the propensity score matching approach (31) that was also used in previous mining-related studies (32). In addition, CEM does not rely on a dimension reduction and explicitly finds matches across all considered covariates. Taking Ghana as an example, there were 280,193 observations of 30 arcsec grid cells for the entire country. A closer look reveals considerable natural and structural differences within the country. The northern half of the country's surface is not covered by forest, and geological conditions are of little use to mining. We seeked to balance observations in mining areas with ones outside, with a cutoff at 50 kilometres (i.e. all observations from the control group were located at least 50 km from mines and all treated were within this threshold; no further matching limitations were specified). Decision on this cut-off was taken based on earlier studies finding that indirect impacts of mining on deforestation in the Brazilian Amazon occur within a 50 km range around mines (32). In accordance with the literature (33), we took a tentative approach -matching on the most important variables and pruning cautiously. For the exemplary case of Ghana, our approach yielded 36,982 observations i.e. approximately 13 percent of the unadjusted sample. The procedure was successful in balancing observations close and far from mines with regard to all considered characteristics. More detail on matching performance is provided in Supplementary Figure  3.
Linear regression. With the resulting matched data, we considered a linear regression model of the form The subscript i ∈ [1, I] indicates country subsets, which include Ni observations. The dependent variable yi ∈ R N i is log-transformed total forest loss in square meters. The treatment variable xi ∈ R N i contains log-transformed distances to the nearest mine. The matrix Zi ∈ R N i ×K contains a set of K control variables (see Section on spatial grid above) and functions thereof. Lastly, ei is an error term with an iid Gaussian distribution with mean zero and constant variance σ 2 .
The linear regression model has some major advantages -it allows us to connect the theoretical setup to the data and yields interpretable outputs (34). One required assumption is linearity in parameters. However, it should be noted that this did not prevent us from capturing a variety of non-linear effects. Most importantly, forest loss and mine distance are not related linearly (32) -i.e. we do not expect the same effects from increasing the distance from one to two kilometres and from 99 to 100 kilometres. Instead, we considered the relation in relative terms, i.e. as an elasticity. By log-transforming both variables the coefficient δi gives the percent change in forest loss from a one percent change in distance, with all other values equal. We applied the same log-transformation to distance variables, given that impacts fade out with increased distance. For these variables, we additionally allowed for discontinuous effects by considering thresholds at 5, 10, 25, and 50 kilometres. In order to check for robustness, we tested our results against a number of alternative specifications (Supplementary Tables 5 and 6).
Another important limitation of the linear model is due to the nature of our dependent variable. There are lower and upper bounds to forest loss, with a considerable number of observations at the lower bound of zero. The linear model does not account for these cutoffs and other approaches may be warranted (34). We considered this and alternatively estimated non-linear variants -(1) a Logit model with logistic link function to estimate the share of forest loss per area, and (2) a Tobit model to address the censored dependent variable. Both variations address issues with the limited dependent variable appropriately and yield more efficient estimates. This improved fit came at the cost of interpretability. Coefficients cannot be interpreted as partial derivatives anymore, but must be seen in context of the non-linear link transformation. To avoid this source of confusion and misinterpretation, we considered these models as robustness checks for the simpler linear model. Results from both approaches largely mirror the linear model, supporting our modelling choice. Estimates from both alternative models are available in Supplementary Table 7.
Hypothetical expansion scenario. For illustrating the coefficient estimates, we computed the indirect deforestation effect of an expansion of all mines in each country. We assumed a 100 m spread of mining polygons, i.e. we created a counterfactual data set, for which we reduced the distances to a mine by 100 m and computed ∆ri, the relative change in distance to a mine of such as scenario for each grid cell r ∈ [0, R] in country i. Observations closer than 100 m to a mining polygon before expansion were excluded, as they needed to be considered as direct deforestation. The indirect deforestation effect was then calculated as the accumulated difference between reported forest loss, Lri, and the estimated forest loss including indirect deforestation due to mine expansion, L * ri . The estimates for L * ri were computed in accordance with the model in Equation 1 as with ∆ri denoting the relative change in distance to mine, δi being the country-specific coefficient and yri the log-transformed total forest loss in square meters.