An artist's rendering of the DNA double helix: Genome researchers have
described the complete sequence of chemical subunits of DNA that spells
out the genetic code in human chromosomes. Image:
National Human Genome Research Institute
October 25, 2004
Genome researchers publish analysis of finished
human genome sequence, plan next steps to figure out what it
By Tim Stephens
A pair of papers published last week in the two leading scientific
journals mark the completion of the Human Genome Project and
the start of a new project to find all of the functional elements
in human DNA. Researchers at UCSC are involved in both projects.
In the October 21 issue of the journal Nature, the International
Human Genome Sequencing Consortium published its scientific
description of the finished human genome sequence, reducing
the estimated number of human protein-coding genes from 35,000
to only 20,000 to 25,000, a surprisingly low number for our
species. In the paper, researchers describe the final product
of the Human Genome Project, the 13-year effort to read the
information encoded in the human chromosomes that reached its
culmination in 2003.
The Nature publication provides rigorous scientific evidence
that the genome sequence produced by the Human Genome Project
has both the high coverage and the accuracy needed to perform
sensitive analyses, such as those focusing on the number of
genes, segmental duplications involved in disease, and the "birth"
and "death" of genes over the course of evolution.
"Obtaining the sequence recording our complete genetic
heritage has been a huge step for humanity. There is no doubt
that this will ultimately transform medicine," said David
Haussler, professor of biomolecular engineering and a Howard
Hughes Medical Institute investigator, who led UCSC's participation
in the Human Genome Project.
The other major paper, published in the October 20 issue of
the journal Science, outlines the plans of a research
consortium organized by the National Human Genome Research Institute
(NHGRI) to produce a comprehensive catalog of all parts of the
human genome crucial to biological function. The ENCyclopedia
Of DNA Elements (ENCODE) consortium has the ambitious goal of
building a "parts list" of all sequence-based functional
elements in the human DNA sequence.
"To really use the human genome sequence for medicine,
we need to understand how it works--that is, what all the As,
Cs, Gs, and Ts are actually doing in the cells in our bodies.
This is much harder than reading the DNA sequence," Haussler
said. "Through the ENCODE consortium, the same kind of
team approach used in the Human Genome Project is being applied
to address this much more difficult challenge."
The list of functional elements compiled by the ENCODE project
will include: protein-coding genes; non-protein-coding genes;
regulatory elements involved in the control of gene transcription;
and DNA sequences that mediate chromosomal structure and dynamics.
The ENCODE researchers also anticipate they may uncover additional
functional elements that have yet to be recognized.
"Creating this monumental reference work will help us
mine and fully utilize the human genome sequence. Such knowledge
will lead to a far deeper understanding of human biology and
stimulate the development of new strategies for improving human
health," said NHGRI director Francis S. Collins.
UCSC researchers have been involved in the analysis of the
human genome since late 1999. James Kent, then a graduate student
in molecular, cell, and developmental biology working with Haussler,
assembled the first working draft of the human genome in 2000
and created the UCSC Genome Browser, a widely used web-based
tool for genomic research. Kent, now a research scientist in
UCSC's Center for Biomolecular Science and Engineering (CBSE),
which Haussler directs, is a coauthor on the Science
and Nature papers, along with Haussler and other CBSE
scientists and graduate students.
The UCSC researchers helped assemble the finished human genome
sequence and made it publicly available to researchers worldwide
through the UCSC Genome Browser. They also performed a key analysis
of the coverage and accuracy of the finished sequence. The browser
displays the finished genome in alignment with dozens of annotation
tracks contributed by researchers at UCSC and collaborators
One of the central goals of the effort to analyze the human
genome is the identification of all genes, which are generally
defined as stretches of DNA that code for particular proteins.
According to the new findings, researchers have confirmed the
existence of 19,599 protein-coding genes in the human genome
and identified another 2,188 DNA segments that are predicted
to be protein-coding genes.
"The analysis found that some of the earlier gene models
were erroneous due to defects in the unfinished, draft sequence
of the human genome," said Jane Rogers, head of sequencing
at the Wellcome Trust Sanger Institute in Hinxton, England.
"The task of identifying genes remains challenging, but
has been greatly assisted by the finished human genome sequence."
The Nature paper also provides the scientific community
with a peer-reviewed description of the finishing process, and
an assessment of the quality of the finished human genome sequence,
which was deposited into public databases in April 2003. The
assessment confirms that the finished sequence now covers more
than 99 percent of the euchromatic (or gene-containing) portion
of the human genome and was sequenced to an accuracy of 99.999
percent, which translates to an error rate of only 1 base per
100,000 base pairs--10 times more accurate than the original
The contiguity of the sequence is also massively improved. The
average DNA letter now sits on a stretch of 38.5 million base
pairs of uninterrupted, high-quality sequence--about 475 times
longer than the 81,500 base-pair stretch that was available
in the working draft. Access to uninterrupted stretches of sequenced
DNA can greatly assist researchers hunting for genes and the
neighboring DNA sequences that may regulate their activity,
dramatically cutting the effort and expense required to find
regions of the human genome that may contain small and often
rare variants involved in disease.
In addition to reducing the count of human genes, scientists
reported that the improved quality of the finished human genome
sequence, compared with earlier drafts, provides a much clearer
picture of certain phenomena such as duplication of DNA segments
and the "birth" and "death" of genes.
Segmental duplications are large, almost identical copies of
DNA, which are present in at least two locations in the human
genome. A number of human diseases are known to be associated
with mutations in segmentally duplicated regions. Segmental
duplications also provide a window into understanding how our
genome evolved and is still changing.
The accuracy of the finished human genome sequence produced
by the Human Genome Project has also given scientists some initial
insights into the birth and death of genes in the human genome.
Scientists have identified more than 1,000 new genes that arose
in the human genome after our divergence with rodents some 75
million years ago. Most of these arose through recent gene duplications
and are involved with immune, olfactory, and reproductive functions.
Additionally, researchers used the finished human genome to
identify and characterize 33 nearly intact genes that have recently
acquired one or more mutations, causing them to stop functioning,
More than 2,800 researchers who took part in the International
Human Genome Sequencing Consortium share authorship on the Nature
paper, which expands upon the group's initial analysis published
in February 2001. In addition to Haussler and Kent, coauthors
on the Nature paper who are affiliated with UCSC include
Robert Baertsch, Hiram Clawson, Mark Diekhans, Terrence Furey,
Angela Hinrichs, Fan Hsu, Yontao Lu, Kate Rosenbloom, Krishna
Roskin, Adam Siepel, Charles Sugnet, Daryl Thomas, Heather Trumbower,
and Ryan Weber.
The coauthors of the Science paper on the ENCODE project
include Haussler, Kent, Daryl Thomas, Kate Rosenbloom, Hiram
Clawson, and Adam Siepel.
Additional information about the National Human Genome Research
Institute is available on the web at /www.genome.gov.
#Return to Front Page