April 21, 2003
Bioinformatics experts gain ground in protein
By Tim Stephens
Proteins, with their extraordinary diversity of structure
pose some of the toughest problems in the field of
rise to a growing arsenal of computational tools for
|Most proteins have highly
shapes. This is an image of human growth hormone.
Swiss Institute of Bioinformatics
An array of computer-based strategies is now available to
biologists who have found an unknown protein, determined
of amino acid subunits, and want to know its
and biological function.
Computational techniques alone may not provide all the answers, but
they are powerful enough to have earned a place in the
for protein research. The Sequence Alignment and Modeling
introduced in the early 1990s by UCSC researchers, has become one of
the most popular software packages for the analysis of
SAM now faces stiff competition, but UCSC researchers keep improving
the software and are working on other software programs to complement
it. Both academic researchers and commercial companies are among the
users of the SAM software.
"We have licensed the SAM software to more than 200
groups and about 20 commercial companies. We also have a web server
that sees over 1,000 uses per week for protein structure
said Richard Hughey, professor and chair of computer engineering at
The list of companies that have licenced the SAM software from UCSC
reads like a Who's Who of the biotechnology industry:
Genentech, Novartis, Pfizer, and Pharmacia, among others.
companies must pay a fee to use the software (as much as $125,000),
academic licenses are free, Hughey said.
Proteins carry out most of the crucial functions of living
are typically large molecules with very complex shapes.
and functional diversity surpasses that of any other kind
Enzymes, antibodies, hormones, muscle, tendons, cartilage, hair, and
feathers are all made of proteins.
At the simplest level, proteins are long chains of subunits called
amino acids. There are 20 different amino acids, and their sequence
in the linear chain of a protein molecule ultimately determines its
structure. Sections of the molecule may twist into coils or fold into
sheets, and the entire protein folds into a precise and often highly
complex three-dimensional structure.
Software programs such as SAM take advantage of the
of related proteins and the existence of large databases of
on known proteins. Proteins that share a common ancestor
have many similarities
in their amino acid sequences. These similarities make it possible to
create statistical models of families of related proteins. A software
program can compare an unknown protein's sequence with such
models and may be able to predict the protein's structure
based on its
similarity to known proteins.
SAM uses a statistical technique known as Hidden Markov
first introduced to the field of bioinformatics by David
of the UC Presidential Chair in computer science and
director of UCSC's
Center for Biomolecular Science and Engineering. The SAM software was
initially developed by Haussler, postdoctoral researcher
now at the University of Copenhagen, and others. Haussler
on DNA sequence analysis, and further development of the SAM software
was taken over by Hughey and Kevin Karplus, professor of
SAM has a history of success in an unusual series of group
performed every two years to establish the state of the art
structure prediction. The Fifth
Community Wide Experiment on the Critical Assessment of
Protein Structure Prediction (CASP5) concluded in December 2002.
The top performers in one category of the CASP5 experiment
that combined several different servers, including SAM, and
agreement between different methods, Karplus said. "The success
of the metaservers was somewhat unexpected--these automatic methods
outperformed most human predictors," he said.
UCSC entered two versions of SAM (SAM-T99 and a newer
in CASP5, as well as a new program Karplus is developing
Undertaker is designed to predict protein folding based on
for parts of a protein molecule that are
be buried inside the structure where they won't come in contact with
"The burial of hydrophobic residues is one of the main driving
forces in protein folding, and Undertaker is an attempt to use that
to predict new folds," Karplus said.
He found that the combination of Undertaker with SAM did not perform
as well as SAM alone on the easier targets, where there was
a good alignment
of the unknown sequence with a known template. The combined programs
did surprisingly well, however, on some of the hardest
"Where our methods had started failing in the past,
we started succeeding," he said. "We still have a
lot of work
to do, but I think we can improve even more over the next