Classifieds

July 25, 2005

Landmark study highlights the importance of 'junk' DNA in higher organism

By Maria Smit

A team led by researchers at UCSC's Center for Biomolecular Science and Engineering (CBSE) has published a landmark study comparing the genomes of a range of different organisms. The study, published in the journal Genome Research, describes the most comprehensive comparison to date of conserved DNA sequences in the genomes of vertebrates, insects, worms, and yeast.

One of the study's major findings is that as organism complexity increases, so too does the proportion of conserved bases in the non-protein-coding (or "junk") DNA sequences. This underscores the importance of gene regulation in more complex species.

The paper also reports exciting biological findings regarding highly conserved DNA elements and the development of a new computational tool for comparing several whole-genome sequences. The first author on the paper is Adam Siepel, a graduate student working with David Haussler, professor of biomolecular engineering and director of the CBSE. The coauthors include investigators from Pennsylvania State University, Washington University School of Medicine, Baylor College of Medicine, and UCSC.

One of the most powerful approaches for pinpointing biologically relevant elements in genomic DNA is to identify sequences that are similar across multiple species. Such approaches are particularly useful for analyzing non-protein-coding sequences--sometimes called "junk" DNA. Although "junk" DNA is poorly understood, the increasing availability of whole-genome sequences is rapidly enhancing the ability of scientists to ascertain the biological significance of these non-protein-coding regions.

"Looking for functional elements in mammalian and other vertebrate genomes is like looking for needles in a haystack," explained Siepel. "By focusing on conserved elements, you get a much smaller haystack. It's not guaranteed to have every needle in it, and not everything in it is a needle, but you're much more likely to find a needle if you look in this smaller haystack than if you look in the big one."

Siepel's team aligned whole-genome sequences for four groups of eukaryotic species (vertebrates, insects, worms, and yeast). The vertebrates included human, mouse, rat, chicken, and puffer fish, and the insects included three species of fruit fly and one species of mosquito. Two worm species and seven yeast species rounded out the set.

To help ease the gargantuan task of identifying conserved elements in multiple alignments of whole-genome sequences, the researchers developed a new computational tool called phastCons. In contrast to traditional tools that compute conservation levels based on sequence similarity at each nucleotide position, phastCons allows for multiple substitutions per site, accounts for unequal rates of substitutions for different nucleotides, and considers the phylogenetic relationships of the species involved.

After applying phastCons to multiple alignments of each of the four groups of eukaryotic species, the researchers estimated that only 3 to 8 percent of the human genome was conserved in the other vertebrate species. On the other hand, the more compact genomes of insects were more highly conserved (37 to 53 percent), as were those of worms (18 to 37 percent) and yeast (47 to 68 percent).

The scientists also observed that the proportion of conserved sequences located outside of protein-coding regions tended to increase with genome length and with the species' general biological complexity.

Most strikingly, the researchers discovered that two-thirds or more of the conserved DNA sequences in vertebrate and insect species were located outside the exons of protein-coding genes, while non-protein-coding sequences accounted for only about 40 percent and 15 percent of the conserved elements in the genomes of worms and yeast, respectively.

"The conserved noncoding story seems to be fairly similar in vertebrates and insects, but looks quite different in worms and yeast," explained Siepel. "These findings support the hypothesis that increased biological complexity in vertebrates and insects derives more from elaborate forms of regulation than from a larger number of protein-coding genes." 

He noted that the results for the worm group should be interpreted cautiously because the analysis was based on the genomes of only two quite divergent worm species.

"We still understand remarkably little about the function and evolutionary origin of these elements," Haussler added.

But the locations of the conserved elements will provide the scientists with some key clues to the potential functions of these sequences.

For genomic scientists, the current study is a major contribution to the field. Not only will the new bioinformatics tool phastCons help researchers identify evolutionarily conserved DNA elements, the reported conserved elements are represented as conservation tracks in the widely used UCSC Genome Browser.

"With phastCons and with the conservation tracks in the browser," says Siepel, "we're trying to make it as easy as possible for researchers to home in on functionally important DNA sequences."

In addition to Siepel and Haussler, other UCSC researchers who contributed to the paper include Gill Bejerano, Angie Hinrichs, Kate Rosenbloom, Hiram Clawson, and James Kent. The paper appeared online on July 15 and will be published in the August print issue of Genome Research.

Email this story
Printer-friendly version
Return to Front Page