Ongoing research projects in the lab:

New Funding from FDA-CFSAN

In October 2022, we began new collaboration between the Johnson Lab at Texas Tech University and the US Food and Drug Administration Center for Food Safety and Nutrition (FDA-CFSAN). Our proposal “Improving detection of plant contaminants in mixed samples with targeted sequencing of 353 nuclear protein coding genes” is aimed at extending Angiosperms353 and HybPiper for use in mixed samples, such as potentially adulterated nutritional supplements. The project funds a post-doctoral researcher, sequencing costs and travel for two years. The abstract of this Broad Agency Agreement (BAA) is below.

The ability to identify unknown plant materials via their DNA is of regulatory interest to detect adulterated products on the market. Unfortunately, the use of DNA for identification of plants has lagged applications in other organisms, often due to lack of appropriate DNA markers variable enough to distinguish closely related species. Targeted sequencing of conserved regions has emerged as a cost-effective way to sequence hundreds of loci in a wide range of organisms. Angiosperms353 are a set of loci recently shown to be present in single-copy in most flowering plants, and are variable within species. Thus, there is an emerging opportunity to use the Angiosperms353 loci to identify plants and allow for the detection of contaminants in mixed plant tissues.

However, there are two major barriers to the use of Angiosperms353 as an identification tool: computational tools to analyze mixed samples, and a comprehensive DNA sequence database of relevant species. While some software is available to analyze single genes in mixed samples (e.g. QIIME2) and other software (e.g. HybPiper) can analyze hundreds of loci from single samples, there is currently no software available for the analysis of hundreds of genes in mixed samples. Further, while some databases of Angiosperms353 loci are in development, their focus is not on botanical products and related species.

Our project has two aims:

Aim 1: Comprehensive Database. Extend an existing database of Angiosperms353 sequences to include plants commonly used in foods and dietary supplements, potential contaminant species, and close relatives of these species.

Aim 2: Proposed Workflow Develop a comprehensive bioinformatics workflow to positively identify plant species represented in a mixed sample using targeted DNA sequencing of 353 nuclear protein coding genes.

The Comprehensive Database will be used to verify the precision and specificity of the Proposed Workflow. Sensitivity will be tested using mock adulterated samples containing DNA from botanicals of interest and common potential contaminants. This proposal will result in two deliverables: the Comprehensive Database of DNA sequences, which will be released into public archives; and the Bioinformatics Workflow for extracting sequences from mixed samples, which will be developed open-source and made available for free. Upon successful development of the Proposed Workflow, downstream data analysis of DNA sequences could be used in future work to identify specific tests to detect adulterants for regulatory purposes.

Potential Graduate Student Projects

If you are interested in joining the lab as a Ph.D. or Master’s student, please look over the following projects. These are projects for which funding or data currently exists, and are also a way to help focus your research interest statement when applying to graduate school. These projects are not set in stone - I will work with all students to develop research questions matching their interests.

Use the Google form here to express your interest in joining the lab. I will review potential graduate student applications in October, and will invite the best candidates to apply to work in my lab before December 1, for fall admission the following year. You can find more information about applying at the TTU Graduate School and the TTU Department of Biological Sciences websites.

Project 1: Functional Genomics of Bryum argenteum

Bryum argenteum is a moss found on all seven continents, including Antartica! However, it is a bit of a nuisance on golf courses in North America, where it is referred to as “silvery threaded moss.” We have a collaboration with a bryophyte physiology lab at the University of Nevada Las Vegas (Lloyd Stark) and the Davey Tree Company (Zane Raudenbush) to help control the moss infestations on putting greens. Interestingly, the mosses on the putting greens have a distinct growth pattern and are always female. We have samples collected from across North America and a promising graduate student project could be to identify local adaptation and/or differential gene expression associated with mosses on putting greens.

Work on this project would involve bioinformatics, phylogenomics, RNA and DNA sequencing, and managing moss tissue cultures.

Project 2: Cryptic Speciation in Physcomitrium pyriforme

The moss Physcomitrium pyriforme is a widespread species, common in eastern North America. Our recent results from targeted DNA sequencing suggest that there may be cryptic taxa within P. pyriforme resulting from autopolyploidy, allopolyploidy, and subtle morphological variation. A graduate student could lead a project testing whether individuals from phylogenetically distinct populations are reproductively isolated. We will conduct experimental crosses and test for hybrids using PCR and DNA fingerprinting. Further assessment of introgression between the distinct populations will use targeted DNA sequencing and hybridization analysis.

Work on this project would involve bioinformatics, targeted DNA sequencing, and managing moss tissue cultures.

Project 3: Phylogenetic Systematics in Flowering Plants

The phylogenomic toolkit Angiosperms353 has made phylogeny inference from hundreds of genes tractable for flowering plant groups with few genomic resources. Targeted DNA sequencing with Angiosperms353 is especially well suited for herbarium specimens, as degraded DNA can be recovered, reducing the need for expensive field work. The E.L. Reed Herbarium is well suited to host graduate students interested in phylogenetic systematics of flowering plants of the southwest United States, especially Texas and New Mexico. Examples of possible taxa to work on include Haplopappus, Cryptantha, Egrostris, and Bouteloua. Students would work with me to develop phylogenetic systematics projects.

Work on this project would involve herbarium curation, bioinformatics, targeted DNA sequencing, and taxonomic revisions using phylogenetic and morphological data. Possible field work and/or travel to other herbaria to collect new specimens.

Project 4: Phylogenomic Methods Development

Our lab has worked on the development of new bioinformatics workflows in non-model plants, including HybPiper, Homologizer, a variant call pipeline for target sequence data, and helpful visualizations for bipartition analysis. We have also made progress in laboratory methods, especially in reducing per-sample costs, enabling studies with dense sampling and wider access to genome-scale approaches.

Students interested in methods development could help move the field forward through innovations in bioinformatics and/or lab procedures. For example, a student could lead development of targeted single-molecule approaches (i.e. Oxford Nanopore) for phylogenetics in plants. Or, a student could contribute to the development of HybPiper, for example to improve the ability to detect and use paralogs in phylogenetic analysis.

The Biodiversity Consequences of Autopolyploidy

Whole genome duplication or autopolyploidy occurred repeatedly during the evolution of land plants and likely acts as a major driver of evolutionary change. When genome duplications first occur within species they potentially result in immediate reproductive isolation of autopolyploids within populations. If autopolyploid lineages are considered “good species,” they may be a source of hidden biodiversity.

Our project seeks to test whether shifts in ploidy are phylogenetically structured within a complex of cryptic moss species, the Physcomitrium pyriforme complex, which is widespread in North America and Europe. The complex harbors seven karyotypes worldwide and exhibits much morphological variation, as reflected by the 29 synonyms. These annual, bisexual and selfing mosses are easily grown, and genome doubling is readily induced in vitro from sporophytic tissue, enabling tests of reproductive isolation among wild and artificial autopolyploids.

Our project addresses four inter-related objectives:

Reconstruct the phylogenomic relationships of 400 populations of P. pyriforme complex using targeted sequencing of 800 low-copy nuclear genes.
Characterize the karyotype and genome size of 400 populations of the P. pyriforme-complex across Europe, and infer frequencies of ploidal shifts within a phylogenomic hypothesis.
Identify morphological signatures of artificial genome duplication and through comparison with wild populations test whether these erode through time
Complement these inferences with experiments testing for reproductive isolation among wild and artificial polyploids and thereby for the evolutionary significance of autopolyploidy.

Plant Phylogenomics

The original definition of Phylogenomics by Eisen suggested that we use the phylogeny to understand the evolution of gene function. In this sense, phylogenomics is a kind of applied phylogenetics, using species relationships to reconstruct the evolution of genomic traits.

In plants, whole genome duplication (WGD) is common– all flowering plants descend from an ancestor that experienced one or more WGD events. Following these WGD events, plant lineages quickly return to functioning as diploid organisms, but multiple copies of some genes are retained.

In this context we are interested in using genomic techniques to ask the following questions:

Are gene duplication events clustered on the phylogeny a result of whole genome duplication or several small-scale duplications?
Is there a functional bias to which genes retain multiple copies following?
Are gene duplications associated with a relaxation in purifying selection?

Targeted Sequencing

Building a phylogeny from one or a few genes is likely to be misleading about the relationships among species. A more accurate reconstruction will use many genes from the nucleus, but this is not cost-effective with traditional PCR and Sanger-sequencing based methods. Similarly, sequencing full genomes using high-throughput sequencing is not feasible for systematics in non-model organisms (yet).

One compromise is to bias high-throughput sequencing libraries to contain a reduced representation of the genome. This technique, known as targeted sequencing, is a cost-effective way to sample hundreds of loci from dozens of samples simultaneously. There are several targeted-sequencing techniques, including Anchored Phylogenetics (aka Ultra-Conserved Elements) or RAD-cap (the capture of restriction-digest associated elements). For plant phylogenetics, HybSeq– the targeting of exons and flanking intron regions, has proven highly effective. In our lab we focus on three aspects of HybSeq targeted sequencing: probe design, the use of herbarium specimens, and data analysis pipelines.

Data Analysis

After sequencing hundreds of genes from dozens of individuals, the challenge is to create data files ready for phylogenetics analysis from sequencing reads. We designed HybPiper to efficiently process reads in three stages: read sorting, contig assembly, and exon extraction. We also designed scripts for extracting intron sequences, detecting paralogous sequence, calculating efficiency statistics, and data visualization.

Our future directions include incorporating allelic information into phylogenetic analysis, correcting errors in contig assembly, and improving the accuracy of assembly from herbarium specimens. x

Probe Design for Sequence Capture

One of the most cost-efficient ways to design probes for targeted sequencing is to use existing transcriptome data in combination with a relatively closely related genome. Homology between transcriptomes and genomes is detected using BLAST searches, or if several sources are available, using orthology searching software such as Orthofinder.

We have collaborated with several groups working on a variety of organisms, including angiosperms, mosses, and even birds. Probes can be designed to fit a variety of taxonomic depths (from species complexes to entire phyla) can include genes that are functionally relevant in addition to phylogenetically informative genes, and can incorporate sequence divergence by using multiple source species.

In collaboration with the Plant and Fungal Tree of Life (PAFTOL) team at Royal Botanical Gardens, Kew, we helped develop a probe kit that promises to reliably amplify up to 350 nuclear coding regions in any angiosperm species. You can read more about the probe design here.

Herbarium Specimens in Phylogenetics

A herbarium is a collection of dried plant collections, much like a museum for plant specimens. Herbarium specimens are both scientific resource and artistic depiction of plant diversity. In the E.L. Reed Herbarium at Texas Tech, over 20,000 recorded (and many yet-unrecorded) specimens show the diversity of plants in West Texas. Click here for more information about the Herbarium.

Herbaria are underutilized resources for plant research, especially in phylogenetics. Collections made from locations that are now difficult to access, or from threatened or extinct species, would be valuable for phylogenetics analysis. However, it can be difficult to extract high-quality DNA from dried plant material.

Enter targeted sequencing.

The ability to reliably use dried plant material for DNA extraction is one of the most important advantages of targeted sequencing. Because the DNA is used in shotgun sequencing library preparation, degradation of DNA to small fragment sizes does not prohibit sequence recovery! In some instances, we have been able to recover 400+ nuclear genes from a 100-year-old herbarium specimen, even though PCR-based methods failed for the same DNA extraction.

In our lab, we have teamed up with the PAFTOL project at Royal Botanical Gardens, Kew to produce a targeted sequencing kit that will work with any angiosperm! We aim to use more than 350 nuclear protein-coding genes to assess:

How does age and method of preservation affect sequence recovery?
What is the level of sequence variation within genera at these markers?
Can the markers be used as a new method of “barcoding” useful for unidentified specimens?

At Texas Tech, we also aim to characterize the genetic diversity of species native to West Texas and surrounding areas using local collections from the early 20th century.

For more information about the Angiosperms353 kit, see our paper in Systematic Biology. The kit is available for purchase from Arbor Biosciences.

Image: Specimen of Portulaca pilosa (Portulaceae) collected on the campus of what was then Texas Techological College in 1925!