UCSC programming prowess gets kudos

December 11, 2006

UCSC programming prowess gets kudos for protein structure prediction

Every second summer, professor of biomolecular engineering Kevin Karplus and his laboratory of bioinformatic scientists forgo the beach so that they can come out ahead in the biggest bioinformatic competition in the world.

The UCSC CASP 7 team is made up of Grant Thiltgen, George Shackelford, Kevin Karplus, Firas Khatib, and Zack Sanborn (not shown: Pinal Kanabar, Chris Wong, Martin Madera, NavyaSwetha Davuluri, Sylvia Do, Crissan Harris, and Cynthia Hsu)
Photo by Branwyn Wagman

This year was the seventh such competition: the Critical Assessment of Techniques for Protein Structure Prediction (CASP 7).

The teams that do the best in the various categories receive a coveted chance to present their findings at the CASP conference, held this year in November at the Asilomar Conference Center in Pacific Grove. As in every competition since CASP 2, the UCSC team was invited to send a speaker. This year it was doctoral student George Shackelford, who had the best results in one category of protein structure prediction.

Protein molecules are long linear chains of amino acids that fold into complex three-dimensional shapes. They carry out an enormous variety of functions in all forms of life. Because of the central role of proteins in biology, predicting protein structure is critically important to biologists, biomedical researchers, and the biotechnology and pharmaceutical industries.

The ultimate goal of CASP, according to Shackelford, is to develop a superior method of solving protein structures. "Given the millions of unknown structures, automating the process will be key to solving them," he said.

In the CASP competition, labs are given limited time to predict the three-dimensional structure of a protein based on its unique amino acid sequence. The sequence comes from a protein whose structure has been solved using laboratory techniques, but not yet published. The predictions can then readily be compared to the actual structure. Recognition is given for overall structure prediction and for several other categories.

Shackelford's results came out the best in the category of "residue-residue contact prediction," which refers to contacts between amino acids that may be widely separated in the sequence but are brought together during protein folding.

Groups have 48 hours to submit predictions made entirely by computer programs running on automated servers. Three weeks are allowed to submit predictions that have been refined by human interaction with the servers, making adjustments to the programming based on the scientists' knowledge of what structures are likely in the protein world. The UCSC team employs a server called SAM-T06, which the team has programmed to predict how proteins will fold into three-dimensional structures (see earlier story).

"My predictions would not have been as good if it weren't for the strength of the protein sequence alignments provided by the SAM server," Shackelford said.

The SAM server was developed under the leadership of Karplus and Richard Hughey, professor and chair of computer engineering, along with biomedical engineer Rachel Karchin, a former UCSC doctoral student who is now an assistant professor at Johns Hopkins University. The SAM-T06 server is available online for all to use and, in fact, other teams employed the UCSC server in their predictions.

For the past two years, Shackelford has been building the contact predictor, a neural network capable of learning from the examples it is given. He trains the neural network by feeding it a large set of data about known sequences and structures, teaching it to distinguish side chains of amino acids that will be in contact as a protein folds. The training prepares it to predict contacts when shown a protein sequence having an unknown structure. Before CASP, Karplus incorporated Shackelford's contact predictor into the SAM server.

"All the effort went into making this a good predictor. The actual contact predictions took less than a half hour," Shackelford said.

Shackelford submitted his server prediction and soon realized that the program contained a bug that compromised the results. He fixed the bug and resubmitted his results before the three-week deadline.

"My revised prediction took first place. The original prediction by the server alone, even with some early bad predictions, came in second," he said.

Even with his success at CASP 7, Shackelford is circumspect. "We don't really know if contact prediction will end up being useful for solving protein structures, but I intend to find out," he said.

This year, 207 human expert groups and 98 prediction servers from throughout the world registered for the competition, and 17 groups submitted contact predictions. Why do so many groups participate in this time-consuming competition?

"CASP is a bias-free environment," Shackelford said. "Since labs don't choose the targets that will ultimately be used in scoring the results, they cannot focus on the easy ones that would make them look successful. Whoever comes out on top can validly be recognized as the best in the field."

In addition to Karplus and Shackelford, the UCSC team included postdoctoral researcher Martin Madera; graduate students Grant Thiltgen, Firas Khatib, Pinal Kanabar, Zack Sanborn, and Chris Wong; and undergraduates NavyaSwetha Davuluri, Sylvia Do, Crissan Harris, and Cynthia Hsu. The entire group worked on three-dimensional models, while only Shackelford worked on contact prediction.

"The CASP experiment has driven considerable innovation in protein structure prediction, particularly the creation of automatic servers on the web," Karplus said. "It has also been widely imitated in other fields, as the need for blind testing has been recognized."

When the conference comes to the local area, die-hards who want more have traditionally stayed on for an informal two-day bioinformatics conference hosted by Karplus at UCSC. This year's follow-up meeting, attended by 40 scientists, was supported by the California Institute for Quantitative Biomedical Research (QB3).