Homoeologous gene expression and co-expression network analyses and evolutionary inference in allopolyploids
Description
<jats:title>ABSTRACT</jats:title><jats:p>Polyploidy is a widespread phenomenon throughout eukaryotes. Due to the coexistence of duplicated genomes, polyploids offer unique challenges for estimating gene expression levels, which is essential for understanding the massive and various forms of transcriptomic responses accompanying polyploidy. Although previous studies have explored the bioinformatics of polyploid transcriptomic profiling, the causes and consequences of inaccurate quantification of transcripts from duplicated gene copies have not been addressed. Using transcriptomic data from the cotton genus (<jats:italic>Gossypium</jats:italic>) as an example, we present an analytical workflow to evaluate a variety of bioinformatic method choices at different stages of RNA-seq analysis, from homoeolog expression quantification to downstream analysis used to infer key phenomena of polyploid expression evolution. In general, GSNAP-PolyCat outperforms other quantification pipelines tested, and its derived expression dataset best represents the expected homoeolog expression and co-expression divergence. The performance of co-expression network analysis was less affected by homoeolog quantification than by network construction methods, where weighted networks outperformed binary networks. By examining the extent and consequences of homoeolog read ambiguity, we illuminate the potential artifacts that may affect our understanding of duplicate gene expression, including an over-estimation of homoeolog co-regulation and the incorrect inference of subgenome asymmetry in network topology. Taken together, our work points to a set of reasonable practices that we hope are broadly applicable to the evolutionary exploration of polyploids.</jats:p>