Skip to main content

Comparative genomics, domestication, and genome size evolution

Comparative evolutionary genomics and domestication genomics are among the most active areas of research in the lab. Please check back here often for updates on our massive genomic resequencing effort, our comparative molecular evolutionary analyses, and our work on domestication genomics.

A remarkable feature of the cotton genus is that it exhibits rather extraordinary genome size evolution.  Some of this is captured in the figure below, which shows the three-fold variation just among the diploids, from a low in the New World D-genome cottons to a high in the Australian K-genome species. This raises some obvious questions!  For example, how does this happen?  What parts of the genomes are growing and shrinking and by what mechanisms?  Also, notice that the allopolyploids, which contain two genomes (A and D), have a combined genome size that is smaller than the sum of their progenitors.  How does this genome down-sizing occur, and what are the consequences for the organisms?  

Comparative evolutionary genomics. We are addressing the foregoing questions using a number of comparative genomic sequencing technologies, all with collaborators in the US and abroad. This entails comparing high-quality de novo genome assemblies among diploids that vary greatly in genome size combined with sequence characterization such that we might better understand the patterns and evolutionary dynamics that generate such remarkable genome size variation.  

Gossypium phylogeny

In addition to genome size variation between species, we also are studying genome size variation within the domesticated species. Graduate student Emma Dostal recently has shown that there exists surprising amounts of infraspecific genome size differences within G. hirsutum, G. barbadense, and G. herbaceum. These discoveries are allowing us to gain some insight into the many external and internal factors that might collectively shape genome size in plants:

Mechanisms of growth and reduction

We also have extended this work to the closest relatives of cotton, which is a small clade of two genera with a fantastically disjunct geographic distribution.  Kokia is an island endemic genus of just several species, from Hawaii, and Gossypioides is its closest relative, from Madagascar (figure from Grover et al., 2017):

Map of the geographic distribution of Kokia and Gossypioides

As seen in the figure, the genomes of these two species are even smaller than the smallest in Gossypium, and by a lot!  We have sequenced and described these two "outgroup" genomes, and documented an extraordinary case of gene loss accompanying genome downsizing. At present, we have no understanding of how these two lineages managed to prune so many genes from their genomes, and yet survive until the present.

Genetic diversity and domestication genomics of the cultivated cotton species.

The evolutionary history of cotton (G. hirsutum, Upland Cotton, and G. barbadense, Pima cotton) includes both polyploidization and independent domestication. In both of the domesticates, modern cultivars were derived from wild ancestors by ancient civilizations in the Yucatan Peninsula (G. hirsutum) and western Peru/Equador (G. barbadense). Moreover, in the Old World, two different diploid species of cotton, G. arboreum and G. herbaceum, were also independently domesticated. This parallel domestication process, resulting in convergent plant architecture and convergent long, strong, white, cotton fibers, offers us a marvelous opportunity to understand the genetic targets of selection (in this case strong directional selection practiced by humans starting more than 5000 years ago), and more generally, how evolution can create and shape and mold plant phenotypes. Several examples of this morphological transformation are shown below.

Parallel domestication of cotton

One can readily appreciate the differences between the wild forms of cotton and the modern varieties.  Here is a wild plant of G. hirsutum; notice how large and rangy it is, not at all like an annualized crop plant:

G. hiirsutum race Yucatanese in flower, 2017, from Georges Ano

And here is what the capsules and seeds with fiber look like:

Wild cotton fruit and seeds


And this is what humans have created from it!:

Cotton blooms

Here is another example, showing the transition from wild (left) to modern (right) Pima cotton (G. barbadense):

Wild and modern G. barbadense

A close-up picture of the fibers shows how selection has altered this single-celled epidermal trichome, in three different species (4th not shown):

Wild vs. domesticated cotton

We have been using comparative genome sequencing and resequencing data to address fundamental questions for polyploid plant genomes. For example, how is genetic diversity partitioned between the two subgenomes in an allopolyploid?  How have humans modified these two subgenomes, and are the effects of directional selection the same in both genomes? Have genes in the two genomes contributed equally to advanced, modern fiber phenotypes? In addition, we have been using the data to define both the portion and the proportion of the wild diversity that was captured during the domestication process. This will help us understand the present gene pool of modern cultivars, and how they have been shaped by selection, and also by interspecific gene flow between the two cultivated species.  ons were collected from elsewhere. In total, 1,432 tetraploid accessions were analyzed with our bioinformatics pipeline. 1,024 samples were retained following quality filtering, including 795 G. hirsutum, 201 G. barbadense and 28 other tetraploid samples (641 samples were from this project, 383 samples from public data).