Progressive Cactus alignment of 298 drosophilid species
-
- Kim, Bernard
- 作成者
メタデータ
- 公開日
- 2023-12-01
- DOI
-
- 10.5061/dryad.x0k6djhrd
- 公開者
- Dryad
- データ作成者 (e-Rad)
-
- Kim, Bernard
説明
The following table lists the set of genomes incorporated into the alignment. Note that accessions will be updated once genomes are processed through NCBI GenBank. name genome source genome accession Stegana nigrithorax NCBI GenBank PENDING Leucophenga maculata NCBI GenBank PENDING Leucophenga montana NCBI GenBank PENDING Cacoxenus indagator NCBI GenBank PENDING Amiota minor NCBI GenBank PENDING Amiota communis NCBI GenBank PENDING Amiota mariae NCBI GenBank PENDING Chymomyza caudatula NCBI GenBank PENDING Chymomyza procnemis NCBI GenBank PENDING Chymomyza amoena NCBI GenBank PENDING Chymomyza costata NCBI GenBank GCA_018150985.1 Chymomyza fuscimana NCBI GenBank GCA_949987675.1 Scaptodrosophila lebanonensis NCBI RefSeq GCF_003285725.1 Scaptodrosophila latifasciaeformis NCBI GenBank PENDING Hirtodrosophila duncani NCBI GenBank PENDING Lordiphosa mommai NCBI GenBank GCA_018904225.1 Lordiphosa collinella NCBI GenBank PENDING Lordiphosa andalusiaca NCBI GenBank PENDING Lordiphosa fenestrarum NCBI GenBank PENDING Lordiphosa magnipectinata NCBI GenBank PENDING Lordiphosa clarofinis NCBI GenBank GCA_018904275.1 Drosophila sturtevanti NCBI GenBank GCA_018150375.1 Drosophila emarginata NCBI GenBank PENDING Drosophila neocordata NCBI GenBank GCA_018903615.1 Drosophila saltans NCBI GenBank GCA_018903575.1 Drosophila austrosaltans NCBI GenBank PENDING Drosophila prosaltans NCBI GenBank GCA_018151275.1 Drosophila sucinea NCBI GenBank GCA_018150745.1 Drosophila nebulosa NCBI GenBank GCA_024703675.1 Drosophila insularis NCBI GenBank GCA_018903935.1 Drosophila tropicalis NCBI GenBank GCA_018151085.1 Drosophila willistoni NCBI RefSeq GCF_018902025.1 Drosophila paulistorum NCBI GenBank GCA_018152135.1 Drosophila equinoxialis NCBI GenBank GCA_018150345.1 Drosophila guanche NCBI RefSeq GCF_900245975.1 Drosophila subobscura NCBI RefSeq GCF_008121235.1 Drosophila bifasciata NCBI GenBank GCA_009664405.1 Drosophila subsilvestris NCBI GenBank PENDING Drosophila obscura NCBI RefSeq GCF_018151105.1 Drosophila ambigua NCBI GenBank GCA_018150905.1 Drosophila tristis NCBI GenBank GCA_018150885.1 Drosophila lowei NCBI GenBank GCA_008121275.1 Drosophila miranda NCBI RefSeq NCBI GenBank Drosophila pseudoobscura NCBI RefSeq GCF_009870125.1 Drosophila persimilis NCBI RefSeq GCF_003286085.1 Drosophila helvetica NCBI RefSeq PENDING Drosophila azteca NCBI GenBank GCA_005876895.1 Drosophila affinis NCBI GenBank PENDING Drosophila algonquin NCBI GenBank PENDING Drosophila athabasca NCBI GenBank GCA_008121215.1 Drosophila setifemur NCBI GenBank GCA_021224005.1 Drosophila ironensis NCBI GenBank GCA_021223825.1 Drosophila varians NCBI GenBank GCA_018150405.1 Drosophila vallismaia NCBI GenBank PENDING Drosophila merina NCBI GenBank PENDING Drosophila ercepeae NCBI GenBank GCA_018150545.1 Drosophila pseudoananassae NCBI GenBank GCA_018153035.1 Drosophila malerkotliana NCBI GenBank GCA_018153235.1 Drosophila bipectinata NCBI RefSeq GCF_018153845.1 Drosophila parabipectinata NCBI GenBank GCA_018153455.1 Drosophila atripex NCBI GenBank PENDING Drosophila monieri NCBI GenBank PENDING Drosophila pandora NCBI GenBank GCA_021223865.1 Drosophila anomalata NCBI GenBank PENDING Drosophila ananassae NCBI RefSeq GCF_017639315.1 Drosophila pallidosa NCBI GenBank PENDING Drosophila oshimai NCBI GenBank GCA_018150695.1 Drosophila ficusphila NCBI RefSeq GCF_018152265.1 Drosophila gunungcola NCBI RefSeq GCF_025200985.1 Drosophila elegans NCBI RefSeq GCF_018152505.1 Drosophila fuyamai NCBI GenBank GCA_018153365.1 Drosophila kurseongensis NCBI GenBank GCA_018153305.1 Drosophila rhopaloa NCBI RefSeq GCF_018152115.1 Drosophila carrolli NCBI GenBank GCA_018152295.1 Drosophila biarmipes NCBI RefSeq GCF_025231255.1 Drosophila subpulchrella NCBI RefSeq GCF_014743375.2 Drosophila suzukii NCBI RefSeq GCF_013340165.1 Drosophila mimetica NCBI GenBank PENDING Drosophila takahashii NCBI RefSeq GCF_018152695.1 Drosophila lutescens NCBI GenBank PENDING Drosophila pseudotakahashii NCBI GenBank PENDING Drosophila prostipennis NCBI GenBank PENDING Drosophila eugracilis NCBI RefSeq GCF_018153835.1 Drosophila melanogaster NCBI RefSeq GCF_000001215.4 Drosophila simulans NCBI RefSeq GCF_016746395.2 Drosophila mauritiana NCBI RefSeq GCF_004382145.1 Drosophila sechellia NCBI RefSeq GCF_004382195.2 Drosophila orena NCBI GenBank GCA_005876975.1 Drosophila erecta NCBI RefSeq GCF_003286155.1 Drosophila teissieri NCBI RefSeq GCF_016746235.2 Drosophila santomea NCBI RefSeq GCF_016746245.2 Drosophila yakuba NCBI RefSeq GCF_016746365.2 Drosophila pectinifera NCBI GenBank GCA_008042775.1 Drosophila triauraria NCBI GenBank GCA_014170255.2 Drosophila auraria NCBI GenBank GCA_008042615.1 Drosophila tani NCBI GenBank GCA_008042535.1 Drosophila rufa NCBI GenBank GCA_018153105.1 Drosophila asahinai NCBI GenBank GCA_008042795.1 Drosophila lacteicornis NCBI GenBank GCA_008044355.1 Drosophila kanapiae NCBI GenBank GCA_008042475.1 Drosophila o ...
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. Whole-genome sequence alignments link evolution at the nucleotide level across species and are a critical but computationally intensive step for downstream genomic analyses. Progressive Cactus is a reference-free, whole-genome alignment tool designed to scale to alignments of thousands of species. In the study associated with this dataset, we conducted Oxford Nanopore long-read sequencing of both inbred lines and single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections. We selected a set of 298 suitably high-quality drosophilid genomes from this study, from publicly available genomes assembled previously by us, and genomes assembled by other studies. Repeats were identified and soft-masked in each genome with RepeatModeler2 and RepeatMasker. A guide tree was constructed from 1,000 single-copy orthologs annotated by BUSCO v5 in all genomes. Individual gene trees were inferred with IQTREE2 and a species tree was estimated from the gene trees with ASTRAL-MP. The tree was scaled by the substitution rate at 4-fold degenerate sites and provided to Progressive Cactus as the guide tree for the alignment. Detailed methods are provided in the study. The alignment is released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
# Progressive Cactus alignment of 298 drosophilid species A Progressive Cactus whole-genome, reference-free alignment was built with 298 drosophilid genomes as listed in our preprint The Cactus alignment was broken up with GNU split to facilitate upload and download. Please download all the parts and run: *cat drosophila.hal. > drosophila.hal* The Progressive Cactus tools must to be installed to utilize the alignments. One representative genome is used for each species. The genome's name in the alignment is the first letter of the genus name, underscore, and full species name, all caps. For example, "*Drosophila melanogaster*" is *"D_MELANOGASTER".* A summary of the data in the alignment can be viewed with: *halStats drosophila.hal*