Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure

Andersson, B.A., Zhao, W., Haller, B.C., Brännström, Å., & Wang, X.‐R. (2023). Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure. Molecular Ecology Resources 23 1589-1603. 10.1111/1755-0998.13825.

[thumbnail of Molecular Ecology Resources - 2023 - Andersson - Inference of the distribution of fitness effects of mutations is affected.pdf]
Preview
Text
Molecular Ecology Resources - 2023 - Andersson - Inference of the distribution of fitness effects of mutations is affected.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (838kB) | Preview

Abstract

The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of single nucleotide polymorphisms (SNPs) and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods-downsampling, imputation and subsampling-with sample sizes of 4-100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000, the sum of 0- and 4-fold SNPs); and (3) population structure may skew the inferred DFE towards more strongly deleterious mutations. We suggest that future studies should consider downsampling for small data sets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.

Item Type: Article
Uncontrolled Keywords: DFE; SLiM simulation; missing-data treatment; population structure; sample size
Research Programs: Advancing Systems Analysis (ASA)
Advancing Systems Analysis (ASA) > Cooperation and Transformative Governance (CAT)
Advancing Systems Analysis (ASA) > Exploratory Modeling of Human-natural Systems (EM)
Depositing User: Luke Kirwan
Date Deposited: 21 Jun 2023 11:41
Last Modified: 08 Jan 2024 07:32
URI: https://pure.iiasa.ac.at/18862

Actions (login required)

View Item View Item