Unplugged: the code of life

May 1, 2011

Edited by Carren Bersch

Curious to know what is happening with high-throughput DNA sequencing, MLO asked the Cleveland Clinic if it could fill readers in on the topic. Luckily, Gary W. Procop, MD, MS, and chair of the department of molecular pathology; section head of clinical and molecular microbiology’ and director of mycology and parasitology at Cleveland Clinic, volunteered to answer our solitary question and respond to our commentary.

MLO: In what ways does/will high-throughput DNA sequencing affect the medical laboratory now/in the future? Our rudimentary understanding is that high-throughput DNA sequencing is the way in which scientists will find the genes that cause all types of diseases, thus making it possible to more readily find cures for those diseases, some of which are rare. In other words, this would be part of “personalizing” testing and medicine.

Gary W. Procop MD, MS, discusses the past,
present, and future of DNA sequencing.

Gary W. Procop, MD, MS: There are a variety of different methods of high throughput or deep sequencing, each with different advantages and limitations. We will have to wait to see which of these will remain viable through the process of market maturation and become commonly used in routine molecular-diagnostic laboratories. Whichever method persists, the effect on the laboratory of the future and, moreover, on medicine in general will be substantial.

Currently, these assays are most commonly used in research projects that require the attributes that these technologies offer. This, however, is often how molecular diagnostics make their way into the clinical laboratory. Techniques and methods are first used by research scientists to address specific questions. It is usually not long thereafter that translational-research projects are performed, and the tests or methods are verified for routine diagnostic use in both sophisticated and research laboratories. High-throughput sequencing appears to be following this same path.

Sanger sequencing revealed

Frederick Sanger, the English biochemist, clearly led the way with the description of sequencing by termination, which won him the Nobel Prize in Chemistry.* Although this remains a useful technology and is used in virtually every sequencing core, it is, in fact, expensive and labor intensive. Additionally, the error rate of DNA polymerase in Sanger sequencing is generally discounted because it is difficult to determine.

Another limitation of Sanger sequencing is the inability to differentiate individual sequences when mixtures of amplified DNA products exist. For example, Sanger sequencing has been used as the standard for sequence-based identification of microorganisms that are slow-growing or difficult to identify in culture (e.g., Mycobacterium spp.) This works well when a single organism is present in culture and a single amplified product is produced by broad-range polymerase chain reaction (PCR). In this scenario, a single sequence is generated, and the microorganism can be identified by comparing this sequence with a sequence database.

This technology is not useful, however, in clinical specimens that contain a number of microorganisms, since broad-range amplification produces a mixture of amplified products and, therefore, a mixed (and uninterpretable) sequence. Similarly, there are situations where a mixture of highly related (i.e., quasi-species) exist in the same infected individual; this is common with RNA viruses, such as HIV. In this situation, the broad range real-time PCR (RT-PCR) product preferentially represents the predominant quasi-species, as does the resultant DNA sequence. The rest of quasi-species or the sub-populations remain hidden.

There are a number of advantages to high-throughput sequencing, which vary depending on the specific technology discussed. When an amplified product of a single PCR reaction, which consists of numerous identical-to-near identical molecules, is sequenced simultaneously in high-throughput formats, then the error contribution of DNA polymerase at any one site becomes negligible, since it is statistically overwhelmed by the more numerous correct incorporations at that site. This multifold coverage of sequence produces more reliable sequence results.

Generating sequence data              

Another advantage is the ability to generate sequence information from a mixture of amplified products, which with Sanger sequencing will simply yield a mixed sequence. Herein, the differentiation of populations will be possible, regardless of whether the populations consist of tumor cells or microorganisms. In addition, the determination of relative quantities of the individual constituents within the population is feasible. These attributes of this technology, without doubt, will change the way that medicine is practiced.

The complex mixtures of microorganisms that occur in particular disease states (e.g., a gastrointestinal abscess cause by a mixture of gastrointestinal microbiota) will be able to be determined, as well as the quantity of each pathogen. Similarly, all the different types of microorganisms that cause a similar clinical syndrome may be assessed simultaneously (e.g., all respiratory viruses) in a single assay rather than in numerous individual assays.

Finally, all of the quasi-species of viruses (e.g., HIV) — some of which may harbor resistance-generating mutations — will be able to be assessed simultaneously, rather than just assessing the most prevalent of the quasi-species, which limits the information available for therapeutic selection.

This ability to determine all the components of a mixture will also have great use in oncology. For the first time ever, all of the malignant cells present in a biopsy may be assessed for a clinically-relevant genetic marker, which will inform clinicians of the possibility of minor, but medically-important clones.

The ability of high throughput DNA sequencing will be particularly important in simultaneously assessing multiple different genetic markers. By covering either vast distances of DNA or by sequencing multiple amplicons following a multiplex reaction, one can envision getting all of the results necessary to characterize a particular tumor type in one reaction, rather than needing to perform multiple independent assays.

This approach would also hold true for the assessment of multiple constitutional abnormalities. This really is a personalized genetic assessment.

Searching for genetic abnormalities

Laboratorians will most likely begin to use this technology by searching for genetic abnormalities that are known to be associated with particular diseases. With the generation of high-throughput DNA sequence, however, there will be large areas of nucleic acid that will be present that are not associated with the disease of interest. The fact that this genetic information will be available on large groups of patients makes possible future searches for associations and causes that has never before been possible. When sufficient sequencing data has been generated and archived on a particular patient population, then one could examine the sequences for unexpected elements that may have an influence on the prognosis or therapeutic outcome of the treatment.

The future is bright for high-throughput DNA sequencing. There are challenges with regard to cost, platform optimization, and data interpretation and storage. These, however, are challenges that are readily embraced, given the advances in medicine that will be possible with this new tool to examine the code of life.


Gary W. Procop, MD, MS, is chair of the department of molecular pathology; section head of clinical and molecular microbiology, and director of mycology and parasitology at Cleveland Clinic, and serves on numerous committees within several professional organizations. He has given more than 375 scientific presentations, and has 124 published manuscripts, 25 chapters, and one book to his credit. His primary interests are the practical applications of molecular diagnostic methods for the diagnosis and treatment of infections; infectious disease pathology; and mycology and parasitology.



Sanger and others’ impact on chemistry

*According to general information gleaned by MLO from a number of websites, Frederick Sanger, OM, CH, CBE, FRS, turns 93 in August. This English biochemist is the fourth and only living person ever to have been awarded two Nobel Prizes, either individually or in tandem with another person. He received the Nobel Prize in Chemistry in 1958 “for his work on the structure of proteins, especially that of insulin,” and in 1980, he and Walter Gilbert shared the Nobel Prize in Chemistry “for their contributions concerning the determination of base sequences in nucleic acids” with Paul Berg who won “for his fundamental studies of the biochemistry of nucleic acids, with particular regard to recombinant-DNA.”

Harvard researcher on messenger RNA

After his 1980 win, Walter Gilbert, now 79, spent much of his time on the road speaking at conferences and visiting other laboratories. With only a brief interruption of his research work at Harvard University, Gilbert has spent most of his career at that institution. By 1961, Gilbert had published his first paper on messenger RNA in Nature; and in 1964, not only had he published the last of his papers in theoretical physics, he became a tenured biophysicist at Harvard.

Stanford’s molecular biologist at Human Genome Project

In 1972, Paul Berg, a molecular biologist at Stanford University beginning in 1959, created the first recombinant DNA molecules and, thus, the field of genetic engineering. In 1985, Berg became director of the New Beckman Center for Molecular and Genetic Medicine. In 1991, Berg accepted a position as head of the National Institute of Health’s Scientific Advisory Committee of the Human Genome Project. Berg will be 85 in June.