Hassani, H., Entezarian, M.R., Zaeimzadeh, S., Marvian, L., & Komendantova, N. ORCID: https://orcid.org/0000-0003-2568-6179
(2025).
An oversampling-undersampling strategy for large-scale data linkage.
Frontiers in Big Data 8 10.3389/fdata.2025.1542483.
Preview |
Text
fdata-1-1542483.pdf - Published Version Available under License Creative Commons Attribution. Download (2MB) | Preview |
Abstract
Effective record linkage in big data, particularly in imbalanced datasets, is a critical yet highly challenging task due to the inherent complexity involved. This article utilizes an oversampling-undersampling strategy to address linkage imbalances, enabling more accurate and efficient record linkage within large-scale datasets. It tries to increase the instances of the minority class and decrease the dominance of the majority classes to try to reach a more balanced dataset that can be used for training and testing. Sensitivity testing was carried out by varying the training-test ratio and degree of imbalance.
Item Type: | Article |
---|---|
Research Programs: | Advancing Systems Analysis (ASA) Advancing Systems Analysis (ASA) > Cooperation and Transformative Governance (CAT) |
Depositing User: | Luke Kirwan |
Date Deposited: | 12 May 2025 07:41 |
Last Modified: | 12 May 2025 07:41 |
URI: | https://pure.iiasa.ac.at/20573 |
Actions (login required)
![]() |
View Item |