Berdejo-Espinola, V., Hajas, Á., Cornford, R.
ORCID: https://orcid.org/0000-0002-9963-3603, Ye, N., & Amano, T.
(2025).
Spanish-language text classification for environmental evidence synthesis using multilingual pre-trained models.
Environmental Evidence 14 (1) e21. 10.1186/s13750-025-00370-9.
Preview |
Text
s13750-025-00370-9.pdf - Published Version Available under License Creative Commons Attribution. Download (2MB) | Preview |
Abstract
Artificial intelligence (AI) is increasingly being explored as a tool to optimize and accelerate various stages of evidence synthesis. A persistent challenge in environmental evidence syntheses is that these remain predominantly monolingual (English), leading to biased results and misinforming cross-scale policy decisions. AI offers a promising opportunity to incorporate non-English language evidence in evidence syntheses screening process and help to move beyond the current monolingual focus of evidence syntheses. Using a corpus of Spanish-language peer-reviewed papers on biodiversity conservation interventions, we developed and evaluated text classifiers using supervised machine learning models. Our best-performing model achieved 100% recall meaning no relevant papers (n = 9) were missed and filtered out over 70% (n = 867) of negative documents based only on the title and abstract of each paper. The text was encoded using a pre-trained multilingual model and class-weights were used to deal with a highly imbalanced dataset (0.79%). This research therefore offers an approach to reducing the manual, time-intensive effort required for document screening in evidence syntheses—with minimal risk of missing relevant studies. It highlights the potential of multilingual large language models and class-weights to train a light-weight non-English language classifier that can effectively filter irrelevant texts, using only a small non-English language labelled corpus. Future work could build on our approach to develop a multilingual classifier that enables the inclusion of any non-English scientific literature in evidence syntheses.
| Item Type: | Article |
|---|---|
| Research Programs: | Biodiversity and Natural Resources (BNR) Biodiversity and Natural Resources (BNR) > Biodiversity, Ecology, and Conservation (BEC) |
| Depositing User: | Luke Kirwan |
| Date Deposited: | 17 Nov 2025 09:04 |
| Last Modified: | 17 Nov 2025 09:04 |
| URI: | https://pure.iiasa.ac.at/20990 |
Actions (login required)
![]() |
View Item |
Tools
Tools