Institute for Genome Sciences

Systems Biology - Tools

The CHAIN Program

Many important principles underlying how protein molecular machines work lie beyond the pale of direct experimental observation. CHAIN analysis is a statistical solution to this problem that follows Mendel’s example: Just as Mendel inferred principles underlying genetic mechanisms from observed patterns of inherited traits, CHAIN analysis seeks to infer principles underlying molecular mechanisms from observed patterns in protein sequences—the cell’s own language for encoding those mechanisms. Sequence patterns that have been conserved for a billion years or more reflect strong selective pressures maintaining mechanistic similarities. Divergent patterns that are conserved in descendent proteins maintaining a particular divergent function likewise reflect mechanistic differences. Thus, non-random patterns of sequence conservation and divergence correspond to conservation and divergence of underlying mechanisms, which we define broadly to include all atomic properties (structural as well as strictly mechanistic) that are required for the function of a protein.

Released in conjunction with a Trends in Biochemical Sciences article describing the CHAIN program (TiBS 32:487-493,2007).

For more information: http://www.chain.umaryland.edu/

MAPGAPS

Identifies and accurately aligns up to a million or more sequences, taking as input a database of fasta formatted protein sequences and, as the query, a hierarchical multiple sequence alignment (hiMSA), such as are available from the NCBI (see below). MAPGAPS generated multiple sequence alignments (MSAs) are used as input by BPPS, SPARC and DARC.

http://www.igs.umaryland.edu/labs/neuwald/software/mapgaps/

http://mapgaps.igs.umaryland.edu/

SPARC Search Procedure for Analysis of Residue Correlations

SPARC computes DCA scores based on a typically large multiple sequence alignment and ranks each protein of known structure using STARC S-scores as a measure of biological relevance.

http://www.igs.umaryland.edu/labs/neuwald/software/sparc/

DARC Deep Analysis of Residue Correlations

DARC performs concurrent identification of residue direct couplings, of protein subgroup-specific patterns, and of correlations between subgroup patterns and structure, between subgroup patterns and direct couplings, and between direct couplings and structure. To assess biological relevance, DARC provides measures of statistical significance, and visualizes correlated features within sequence alignments and within 3D structures (via PyMOL scripts).

http://www.igs.umaryland.edu/labs/neuwald/software/darc/

STARC Statistical Tool for Analysis of Residue Couplings

http://evaldca.igs.umaryland.edu/

SIPRIS Structurally Interacting Patterns Residues’ Inferred Significance

To identify residues responsible for allostery, cooperativity, and other subtle but functionally important interactions, SIPRIS employs statistical inference based on the assumption that residues distinguishing a protein subgroup from evolutionarily divergent subgroups often constitute an interacting functional network. SIPRIS identifies such networks with the aid of two measures of statistical significance. One measure aids identification of divergent subgroups based on distinguishing residue patterns. For each subgroup, a second measure identifies structural interactions involving pattern residues.

Reference: Neuwald, A.F , Aravind, L. & S. F. Altschul. 2018. Inferring Joint Sequence-Structure Determinants of Protein Functional Specificity. eLife 7: e29880. doi: 10.7554/eLife.29880.

http://www.igs.umaryland.edu/labs/neuwald/software/sipris/

BPPS: Bayesian Partitioning with Pattern Selection

Protein superfamilies often diverge into subgroups, each adapting the superfamily’s structural core to fill a functional niche. Often a subgroup G diverges further into smaller subgroups, each conserving residues constrained by G’s function, as well as other residues constrained by more specialized functions. Repeated rounds of such divergence have led to hierarchically arranged subgroups, each of which conserves distinctive residues at particular positions. BPPS identifies and characterizes these subgroups by partitioning a multiple sequence alignment (MSA) into a hierarchically nested series of sub-MSAs based on correlated residue patterns that are distinctive of each subgroup.

http://www.igs.umaryland.edu/labs/neuwald/software/bpps/

GISMO – Gibbs Sampler for Multi-Alignment Optimization

http://www.igs.umaryland.edu/labs/neuwald/software/gismo/