Hassani, H., Entezarian, M.R., Zaeimzadeh, S., Marvian, L., & Komendantova, N.  ORCID: https://orcid.org/0000-0003-2568-6179
  
(2025).
    An oversampling-undersampling strategy for large-scale data linkage.
  
    Frontiers in Big Data 8 10.3389/fdata.2025.1542483.
ORCID: https://orcid.org/0000-0003-2568-6179
  
(2025).
    An oversampling-undersampling strategy for large-scale data linkage.
  
    Frontiers in Big Data 8 10.3389/fdata.2025.1542483.
    
    
  
  
  
| Preview | Text fdata-1-1542483.pdf - Published Version Available under License Creative Commons Attribution. Download (2MB) | Preview | 
Abstract
Effective record linkage in big data, particularly in imbalanced datasets, is a critical yet highly challenging task due to the inherent complexity involved. This article utilizes an oversampling-undersampling strategy to address linkage imbalances, enabling more accurate and efficient record linkage within large-scale datasets. It tries to increase the instances of the minority class and decrease the dominance of the majority classes to try to reach a more balanced dataset that can be used for training and testing. Sensitivity testing was carried out by varying the training-test ratio and degree of imbalance.
| Item Type: | Article | 
|---|---|
| Research Programs: | Advancing Systems Analysis (ASA) Advancing Systems Analysis (ASA) > Cooperation and Transformative Governance (CAT) | 
| Depositing User: | Luke Kirwan | 
| Date Deposited: | 12 May 2025 07:41 | 
| Last Modified: | 12 May 2025 07:41 | 
| URI: | https://pure.iiasa.ac.at/20573 | 
Actions (login required)
|  | View Item | 
 Tools
 Tools Tools
 Tools