How many people need to classify the same image? A method for optimizing volunteer contributions in binary geographical classifications

Salk, C., Moltchanova, E., See, L. ORCID: https://orcid.org/0000-0002-2665-7065, Sturn, T., McCallum, I. ORCID: https://orcid.org/0000-0002-5812-9988, & Fritz, S. ORCID: https://orcid.org/0000-0003-0420-8549 (2022). How many people need to classify the same image? A method for optimizing volunteer contributions in binary geographical classifications. PLOS ONE 17 (5) e0267114. 10.1371/journal.pone.0267114.

[thumbnail of journal.pone.0267114.pdf]
Preview
Text
journal.pone.0267114.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview
[thumbnail of journal.pone.0267114.s001.R] Text
journal.pone.0267114.s001.R - Supplemental Material
Available under License Creative Commons Attribution.

Download (21kB)

Abstract

Involving members of the public in image classification tasks that can be tricky to automate is increasingly recognized as a way to complete large amounts of these tasks and promote citizen involvement in science. While this labor is usually provided for free, it is still limited, making it important for researchers to use volunteer contributions as efficiently as possible. Using volunteer labor efficiently becomes complicated when individual tasks are assigned to multiple volunteers to increase confidence that the correct classification has been reached. In this paper, we develop a system to decide when enough information has been accumulated to confidently declare an image to be classified and remove it from circulation. We use a Bayesian approach to estimate the posterior distribution of the mean rating in a binary image classification task. Tasks are removed from circulation when user-defined certainty thresholds are reached. We demonstrate this process using a set of over 4.5 million unique classifications by 2783 volunteers of over 190,000 images assessed for the presence/absence of cropland. If the system outlined here had been implemented in the original data collection campaign, it would have eliminated the need for 59.4% of volunteer ratings. Had this effort been applied to new tasks, it would have allowed an estimated 2.46 times as many images to have been classified with the same amount of labor, demonstrating the power of this method to make more efficient use of limited volunteer contributions. To simplify implementation of this method by other investigators, we provide cutoff value combinations for one set of confidence levels.

Item Type: Article
Research Programs: Advancing Systems Analysis (ASA)
Advancing Systems Analysis (ASA) > Novel Data Ecosystems for Sustainability (NODES)
Strategic Initiatives (SI)
Depositing User: Michaela Rossini
Date Deposited: 20 May 2022 08:35
Last Modified: 19 Oct 2022 05:01
URI: https://pure.iiasa.ac.at/18017

Actions (login required)

View Item View Item