Institute for Genome Sciences

Bioinformatics - Pipelines

IGS Prokaryotic Annotation Pipeline

IGS has developed a comprehensive automated pipeline for use with Bacteria and Archaea (Galens, et. al., PMID:21677861). The pipeline predicts protein-coding genes as well as non-coding RNAs. Similarity evidence is collected for predicted proteins with a variety of methods including pairwise alignments, HMM searches, and multiple motif prediction tools. A hierarchical rule-based system is used to assign annotation to each protein based on the highest quality available evidence. Results are loaded into a relational database and can be viewed using the Manatee annotation visualization and curation tool. Results are also available in multiple standard flat file formats.

Transcriptome Analysis Pipeline

Included in this pipeline is the alignment of reads to a reference genome, RPKM analysis differential expression analysis, isoform analysis and differential isoform analysis. We are also able to do de novo transcriptome assembly. Results are output as spreadsheets containing statistics, differentially expressed genes, isoforms and differentially expressed isoforms as well as pdf plots and figures. Visualization tools such as the Integrative Genome Browser (IGV) can be used.

Comparative genomics using protein clusters

This pipeline uses Jaccard filtered bi-directional best blast matches to produce ortholog clusters (Crabtree, et. al., PMID:18314579). It has been successfully used for the comparison of 100 (or more) genomes at one time. The web-based visualization tool Sybil is used to search and view ortholog clusters, genomic context, synteny, and more.

Comparative genomics using Mugsy

This method employs the Mugsy whole genome alignment algorithm (Angiuoli, et. al., PMID:21148543). Mugsy is a reference-independent tool that builds protein ortholog groups based on whole genome multiple alignments and synteny thus helping to differentiate between paralogs and orthologs. This method is optimized for comparing closely related organisms. The web-based visualization tool Sybil is used to search and view ortholog clusters, genomic context, synteny, and more.