The human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave. The discovery, considered a major medical and scientific breakthrough, has enormous implications for human health because many complex diseases appear to be caused by tiny changes in hundreds of gene switches.
The findings, which are the fruit of an immense federal project involving 440 scientists from 32 laboratories around the world, will have immediate applications for understanding how alterations in the non-gene parts of DNA contribute to human diseases, which may in turn lead to new drugs. They can also help explain how the environment can affect disease risk. In the case of identical twins, small changes in environmental exposure can slightly alter gene switches, with the result that one twin gets a disease and the other does not.
As scientists delved into the “junk” — parts of the DNA that are not actual genes containing instructions for proteins — they discovered a complex system that controls genes. At least 80 percent of this DNA is active and needed. The result of the work is an annotated road map of much of this DNA, noting what it is doing and how. It includes the system of switches that, acting like dimmer switches for lights, control which genes are used in a cell and when they are used, and determine, for instance, whether a cell becomes a liver cell or a neuron.
In prostate cancer, for example, his group found mutations in important genes that are not readily attacked by drugs. But Encode, by showing which regions of the dark matter control those genes, gives another way to attack them: target those controlling switches.
Dr. Rubin, who also used the Google Maps analogy, explained: “Now you can follow the roads and see the traffic circulation. That’s exactly the same way we will use these data in cancer research.” Encode provides a road map with traffic patterns for alternate ways to go after cancer genes, he said.
Dr. Bernstein said, “This is a resource, like the human genome, that will drive science forward.”
The system, though, is stunningly complex, with many redundancies. Just the idea of so many switches was almost incomprehensible, Dr. Bernstein said.
There also is a sort of DNA wiring system that is almost inconceivably intricate.
“It is like opening a wiring closet and seeing a hairball of wires,” said Mark Gerstein, an Encode researcher from Yale. “We tried to unravel this hairball and make it interpretable.”
There is another sort of hairball as well: the complex three-dimensional structure of DNA. Human DNA is such a long strand — about 10 feet of DNA stuffed into a microscopic nucleus of a cell — that it fits only because it is tightly wound and coiled around itself. When they looked at the three-dimensional structure — the hairball — Encode researchers discovered that small segments of dark-matter DNA are often quite close to genes they control. In the past, when they analyzed only the uncoiled length of DNA, those controlling regions appeared to be far from the genes they affect.
The project began in 2003, as researchers began to appreciate how little they knew about human DNA. In recent years, some began to find switches in the 99 percent of human DNA that is not genes, but they could not fully characterize or explain what a vast majority of it was doing.
The thought before the start of the project, said Thomas Gingeras, an Encode researcher from Cold Spring Harbor Laboratory, was that only 5 to 10 percent of the DNA in a human being was actually being used.
The big surprise was not only that almost all of the DNA is used but also that a large proportion of it is gene switches. Before Encode, said Dr. John Stamatoyannopoulos, a University of Washington scientist who was part of the project, “if you had said half of the genome and probably more has instructions for turning genes on and off, I don’t think people would have believed you.”
By the time the National Human Genome Research Institute, part of the National Institutes of Health, embarked on Encode, major advances in DNA sequencing and computational biology had made it conceivable to try to understand the dark matter of human DNA. Even so, the analysis was daunting — the researchers generated 15 trillion bytes of raw data. Analyzing the data required the equivalent of more than 300 years of computer time.