Data from: Species delimitation with ABC and other coalescent-based methods: a test of accuracy with simulations and an empirical example with lizards of the Liolaemus darwinii complex (Squamata: Liolaemidae)

Metadata

Published
2012-01-01
DOI
  • 10.5061/dryad.4409k652
Publisher
Dryad
Creator Name (e-Rad)
  • Camargo, Arley
  • Morando, Mariana
  • Avila, Luciano Javier
  • Sites Jr., Jack W.

Description

Alignment of cyt bText file in nexus format containing the alignment of cytochrome b mtDNA data of the Liolaemus darwinii complex (77 taxa, 713 bp)cytb.nexAlignment of A1DText file in nexus format containing the alignment of the anonymous nuclear locus A1D of the Liolaemus darwinii complex (80 taxa, 700 bp)lda1d.nexAlignment of A9CText file in nexus format containing the alignment of the anonymous nuclear locus A9C of the Liolaemus darwinii complex (80 taxa, 481 bp)lda9c.nexAlignment of B6BText file in nexus format containing the alignment of the anonymous nuclear locus B6B of the Liolaemus darwinii complex (80 taxa, 415 bp)ldb6b.nexSimulation code for evaluation of species delimitation methodsCompressed file containing a Perl script (speciation.pl), associated files, and executables for simulating sequence data and summary statistics used as input of species delimitation methodsScripts.zipInput and output of SpeDeSTEM analysesCompressed file containing the input and output files of SpeDeSTEM analyses based on the empirical data from the Liolaemus darwinii complexSpedestem_files.tgzInput and output of BPP analysesCompressed file containing input and output files of BPP analyses based on empirical data from the Liolaemus darwinii complexBPP_files.tgzInput and output of ABC analysesCompressed file containing simulation code for generating summary statistics used in ABC analyses, observed summary statistics for species delimitation (obs12sust_SD) and parameter estimation (obs44sust_PE) based on empirical data from the Liolaemus darwinii complex, and output of ABC analyses (R_output_SD and R_output_PE) in RABC_files.tgzTable S1Summary statistics (SuSt) of the priors, the observed data, and the posterior of ABC analyses for species delimitation. Observed SuSt were obtained from the sequence data of four loci for the L. darwinii complex. Mean and standard deviation (StDv) of SuSt were calculated across simulated or observed loci. Abbreviations: π = mean of pairwise differences, S = number of segregating sites, MFS = mutation frequency spectrum. Details about the computation of these SuSt are given in user guide of the program popABC.TableS1.docTable S2Summary statistics (SuSt) of the prior, the observed data, and the posterior of ABC analyses for parameter estimation. Observed SuSt were obtained from the sequence data of four loci for the L. darwinii complex. Mean and standard deviation (StDv) of SuSt were calculated across simulated or observed loci. Abbreviations: π = mean of pairwise differences, MFS = mutation frequency spectrum, NmS = Nm-like based on the number of segregating sites, privS = number of private segregating sites, 25%Q = 25% quartile, 75%Q = 75% quartile.TableS2.docTable S3Model selection with AIC criteria of alternative demographic models estimated with IMa2. Models are nested within the full migration model and are ranked based on AIC values. Best model is bolded. AIC = Akaike information criterion, k = number of parameters, Δi = AIC - AICmin, ωi = model probability. Subscripts of M (migration) parameters represent the directionality of gene flow between populations going backwards in time: 0 = L. darwinii-N, 1= L. darwinii-S, 2 = L. laurenti, and 3 = L. grosseorum. Parameters values constrained in nested models are enclosed in squared brackets.TableS3.docFigure S1Species tree of the L. darwinii complex based on a Bayesian analysis in *BEAST. Numbers above branches are posterior probabilities of clades. The scale bar represents substitutions per site.FigureS1.pdfAppendix 1List of specimens sequenced for this study. Code represents the numbers used to identify sampled localities in Fig. 1. GenBank accession numbers are given for each locus; italicized numbers indicate sequences used in species-tree analysis. Acronyms: BYU = Bean Life Science Museum, Brigham Young University (USA); FML = Fundación Miguel Lillo (Argentina); IMCN = Instituto y Museo de Ciencias Naturales, Universidad Nacional de San Juan (Argentina); LJAMM-CNP = herpetological collection of the Centro Nacional Patagónico (CENPAT-CONICET, Argentina); and MLP S. = herpetological collection of the Museo de La Plata (Argentina).Appendix1.docxAppendix S1Figure 1. Number of references published between 1981 and 2010 retrieved from the ISI Web of Science that contained the keyword "species delimitation". Figure 2. Number of references published between 1992 and 2010 retrieved from the ISI Web of Science that contained the keyword "approximate bayesian computation".AppendixS1.pdf

Species delimitation is a major research focus in evolutionary biology because accurate species boundaries are a prerequisite for the study of speciation. New species delimitation methods (SDMs) can accommodate non-monophyletic species and gene tree discordance as a result of incomplete lineage sorting via the coalescent model, but do not explicitly accommodate gene flow after divergence. Approximate Bayesian Computation (ABC) can incorporate gene flow and estimate other relevant parameters of the speciation process while testing alternative species delimitation hypotheses. We evaluated the accuracy of BPP, SpeDeSTEM, and ABC for delimiting species using simulated data and applied these methods to empirical data from lizards of the Liolaemus darwinii complex. Overall, BPP was the most accurate, ABC showed an intermediate accuracy, and SpeDeSTEM was the least accurate under most simulated conditions. All three SDMs showed lower accuracy when speciation occurred despite gene flow, as found in previous studies, but ABC was the method with the smallest decrease in accuracy. All three SDMs consistently supported the distinctness of southern and northern lineages within L. darwinii. These SDMs based on genetic data should be complemented with novel SDMs based on morphological and ecological data to achieve truly integrative and statistically robust approaches to species discovery.

Related Articles

See more

Details 詳細情報について

Back to top