Ongoing research projects in the lab:

The Biodiversity Consequences of Autopolyploidy

Whole genome duplication or autopolyploidy occurred repeatedly during the evolution of land plants and likely acts as a major driver of evolutionary change. When genome duplications first occur within species they potentially result in immediate reproductive isolation of autopolyploids within populations. If autopolyploid lineages are considered “good species,” they may be a source of hidden biodiversity.

Our project seeks to test whether shifts in ploidy are phylogenetically structured within a complex of cryptic moss species, the Physcomitrium pyriforme complex, which is widespread in North America and Europe. The complex harbors seven karyotypes worldwide and exhibits much morphological variation, as reflected by the 29 synonyms. These annual, bisexual and selfing mosses are easily grown, and genome doubling is readily induced in vitro from sporophytic tissue, enabling tests of reproductive isolation among wild and artificial autopolyploids.

Our project addresses four inter-related objectives:

  1. Reconstruct the phylogenomic relationships of 400 populations of P. pyriforme complex using targeted sequencing of 800 low-copy nuclear genes.
  2. Characterize the karyotype and genome size of 400 populations of the P. pyriforme-complex across Europe, and infer frequencies of ploidal shifts within a phylogenomic hypothesis.
  3. Identify morphological signatures of artificial genome duplication and through comparison with wild populations test whether these erode through time
  4. Complement these inferences with experiments testing for reproductive isolation among wild and artificial polyploids and thereby for the evolutionary significance of autopolyploidy.

Plant Phylogenomics

The original definition of Phylogenomics by Eisen suggested that we use the phylogeny to understand the evolution of gene function. In this sense, phylogenomics is a kind of applied phylogenetics, using species relationships to reconstruct the evolution of genomic traits.

In plants, whole genome duplication (WGD) is common– all flowering plants descend from an ancestor that experienced one or more WGD events. Following these WGD events, plant lineages quickly return to functioning as diploid organisms, but multiple copies of some genes are retained.

In this context we are interested in using genomic techniques to ask the following questions:

  • Are gene duplication events clustered on the phylogeny a result of whole genome duplication or several small-scale duplications?
  • Is there a functional bias to which genes retain multiple copies following?
  • Are gene duplications associated with a relaxation in purifying selection?

Targeted Sequencing

Building a phylogeny from one or a few genes is likely to be misleading about the relationships among species. A more accurate reconstruction will use many genes from the nucleus, but this is not cost-effective with traditional PCR and Sanger-sequencing based methods. Similarly, sequencing full genomes using high-throughput sequencing is not feasible for systematics in non-model organisms (yet).

One compromise is to bias high-throughput sequencing libraries to contain a reduced representation of the genome. This technique, known as targeted sequencing, is a cost-effective way to sample hundreds of loci from dozens of samples simultaneously. There are several targeted-sequencing techniques, including Anchored Phylogenetics (aka Ultra-Conserved Elements) or RAD-cap (the capture of restriction-digest associated elements). For plant phylogenetics, HybSeq– the targeting of exons and flanking intron regions, has proven highly effective. In our lab we focus on three aspects of HybSeq targeted sequencing: probe design, the use of herbarium specimens, and data analysis pipelines.

Data Analysis

After sequencing hundreds of genes from dozens of individuals, the challenge is to create data files ready for phylogenetics analysis from sequencing reads. We designed HybPiper to efficiently process reads in three stages: read sorting, contig assembly, and exon extraction. We also designed scripts for extracting intron sequences, detecting paralogous sequence, calculating efficiency statistics, and data visualization.

Our future directions include incorporating allelic information into phylogenetic analysis, correcting errors in contig assembly, and improving the accuracy of assembly from herbarium specimens. x

Probe Design for Sequence Capture

One of the most cost-efficient ways to design probes for targeted sequencing is to use existing transcriptome data in combination with a relatively closely related genome. Homology between transcriptomes and genomes is detected using BLAST searches, or if several sources are available, using orthology searching software such as Orthofinder.

We have collaborated with several groups working on a variety of organisms, including angiosperms, mosses, and even birds. Probes can be designed to fit a variety of taxonomic depths (from species complexes to entire phyla) can include genes that are functionally relevant in addition to phylogenetically informative genes, and can incorporate sequence divergence by using multiple source species.

In collaboration with the Plant and Fungal Tree of Life (PAFTOL) team at Royal Botanical Gardens, Kew, we helped develop a probe kit that promises to reliably amplify up to 350 nuclear coding regions in any angiosperm species. You can read more about the probe design here.

Herbarium Specimens in Phylogenetics

A herbarium is a collection of dried plant collections, much like a museum for plant specimens. Herbarium specimens are both scientific resource and artistic depiction of plant diversity. In the E.L. Reed Herbarium at Texas Tech, over 20,000 recorded (and many yet-unrecorded) specimens show the diversity of plants in West Texas. Click here for more information about the Herbarium.

Herbaria are underutilized resources for plant research, especially in phylogenetics. Collections made from locations that are now difficult to access, or from threatened or extinct species, would be valuable for phylogenetics analysis. However, it can be difficult to extract high-quality DNA from dried plant material.

Enter targeted sequencing.

The ability to reliably use dried plant material for DNA extraction is one of the most important advantages of targeted sequencing. Because the DNA is used in shotgun sequencing library preparation, degradation of DNA to small fragment sizes does not prohibit sequence recovery! In some instances, we have been able to recover 400+ nuclear genes from a 100-year-old herbarium specimen, even though PCR-based methods failed for the same DNA extraction.

In our lab, we have teamed up with the PAFTOL project at Royal Botanical Gardens, Kew to produce a targeted sequencing kit that will work with any angiosperm! We aim to use more than 350 nuclear protein-coding genes to assess:

  1. How does age and method of preservation affect sequence recovery?
  2. What is the level of sequence variation within genera at these markers?
  3. Can the markers be used as a new method of “barcoding” useful for unidentified specimens?

At Texas Tech, we also aim to characterize the genetic diversity of species native to West Texas and surrounding areas using local collections from the early 20th century.

For more information about the Angiosperms353 kit, see our paper in Systematic Biology. The kit is available for purchase from Arbor Biosciences.

Image: Specimen of Portulaca pilosa (Portulaceae) collected on the campus of what was then Texas Techological College in 1925!