PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.
The tool serves in the field of metagenomic analysis where it allows inference of the functional profile of a microbial community based on marker gene survey along one or more samples. In essence, PICRUSt takes a user supplied operational taxonomic unit table (typically referred to as an OTU table), representing the marker gene sequences (most commonly a 16S Cluster analysis) accompanied with its relative abundance in each of the samples. The output of PICRUSt is a sample by functional-gene-count matrix, telling the count of each functional-gene in each of the samples surveyed. The ability of PICRUSt to estimate the functional-gene profile for a given sample relies on a set of known sequenced genomes. This could also be thought of as an automated alternative to manually researching the gene families likely to be present in organisms whose sequences are found in a 16S ribosomal RNA amplicon library. The below description corresponds to the original version of PICRUSt, but a major update to this tool is currently being developed.
Notably, while this functionality is typically used for prediction of gene copy numbers in bacteria, it could, in principle, be used for prediction of any other continuous trait given trait data for diverse organisms and a reference phylogeny.
Langille et al. tested the accuracy of this genome prediction step using leave-one-out cross validation on the input set of sequenced genomes. Additional tests examined sensitivity to errors in phylogenetic inference, lack of genomic data, and the accuracy of the confidence intervals on gene content.
A similar step predicts the copy number of 16S rRNA genes.
Langille et al., 2013 tested the accuracy of this genome prediction step by using previously reported datasets in which the same biological sample was subjected to 16S rRNA gene amplification and shotgun metagenomics. In these cases, the shotgun metagenomic results were taken as a representation of the 'true' community, and the 16S rRNA gene amplicon libraries fed into PICRUSt to attempt to predict those data. Test datasets included human microbiome samples from the Human Microbiome Project, soil samples, diverse mammalian samples, and samples from the Guerrero Negro microbial mats
CopyRighter, like PICRUSt, uses evolutionary modeling and phylogenetic trait prediction to estimate 16S rRNA gene sequence copy numbers for each bacterial and archaeal type in a sample, and then uses these estimates to correct estimates of community composition.
PanFP presented a similar method, but based on genome predictions for each taxonomic group. Benchmarking showed highly similar performance to PICRUSt when compared on the same datasets. One advantage is that all OTUs, not just those in a reference phylogeny table can be used. One disadvantage is that confidence intervals and evolutionary models are not constructed.
PAPRICA is a metagenome prediction tool based on placing input 16S rRNA gene sequences into a known phylogenetic tree based corresponding to reference genomes. The main prediction output corresponds to Enzyme Commission numbers.
Piphillin is a tool produced by the company Second Genome that produces metagenome predictions based on nearest-neighbour clustering of input 16S rRNA gene sequences with 16S rRNA gene sequences from reference genomes. There is a web portal for running this tool on the Second Genome website. This tool is under continual development and undergoing validation as summarized in a 2020 publication.
Tax4Fun is a similar tool based on linking the 16S ribosomal RNA genes from all KEGG organisms with 16S rRNA gene sequences found in the SILVA ribosomal RNA database. Originally this tool was restricted to 16S rRNA gene sequences found within the SILVA database. However, the latest version of this tool, Tax4Fun2, can be used with OTUs or amplicon sequence variants from any clustering pipeline.
|
|