Rosetta is a computer program for de novo protein structure prediction, where de novo implies modeling in the absence of detectable sequence similarity to a previously determined three-dimensional protein structure. Rosetta uses small sequence similarities from the Protein Data Bank [http://www.rcsb.org/pdb/] to estimate possible conformations for local sequence segments (three and nine residue segments). These segments are called fragments of local structure. It then assembles these pre-computed structure fragments by minimizing a global scoring function that favors hydrophobic burial and packing, strand pairing, compactness and energetically favorable residue pairings. Results from the fourth and fifth critical assessment of structure prediction (CASP4, CASP5) [http://predictioncenter.llnl.gov/] have shown that Rosetta is currently one of the best methods for de novo protein structure prediction and distant fold recognition.
Using Rosetta generated structure predictions we were previously able to recapitulate or predict many functional insights not detectable from primary sequence. Rosetta was also recently used to generate both fold and function predictions for Pfam protein families that had no link to a known structure, resulting in many high confidence fold predictions. In spite of these successes, Rosetta has a significant error rate, as do all methods for distant fold recognition and de novo structure prediction. We thus calculate not just the structure but also the probability that the predicted structure is correct using the Rosetta confidence function. The Rosetta confidence function partially mitigates this error rate by assessing the accuracy of predicted folds.
Another unavoidable source of uncertainty, with respect to function prediction from structure, is the error associated with distilling function from fold matches. Sometime fold carry out more than one function. The predictions generated by de novo structure prediction are thus best used in combination with other sources of putative or general functional information such as proximity in protein association or gene regulatory networks. Thus, making the predictions resulting from this project available to the public in a easily accessible way is a critical final step in this project.
For a quicktime movie showing a protein (Ubiquitin) being folded by Rosetta click here [1d3z.mov OR 1d3z.mpg ]
Rosetta uses a scoring function to judge different conformations (shapes/packings of amino acids within the protein). The simulation consists of making moves (changing the bond angles of a bunch of amino acids) and then scoring the new conformation. The rosetta score is a weighted sum of component scores, where each component score is judging a different thing. The environment score is judging how well the hydrophobic (oily) residues are packing together to form a core, while the pair-score is judging how compatible touching residues are with each other one pair at a time.
Environment score: The formation of a hydrophobic core, or the hydrophobic effect, is for most proteins the central driving force for protein folding. The Rosetta environment score rewards burial of hydrophobic residues in a compact hydrophobic core and penalizes solvation of these oily groups. I’ve represented hydrophobic residues as orange stars. The left conformation is good (all the hydrophobics together) while the rightmost conformation is bad (with the hydrophobic amino acids not touching).
Pair-score: Two conformations of a polypeptide are shown, one (top) where the chain is folded back on itself bringing two cysteins together (yellow + yellow = possible disulphide bond) and forming a salt-bridge (blue+red = opposites attract). The conformation at bottom does not make these pairings and the pair-score would, thus, favor the top conformation.