Most of the informational content carried in nucleic acids such as human chromosomes is in the form of the linear order of bases—the sequence of the DNA (or RNA). Determining this, known as sequencing, can be a powerful diagnostic method in the molecular diagnostics (MDx) lab toolkit for certain applications. Much attention is now directed to high-throughput, massively parallel “next-generation” sequencing (NGS) techniques which offer the capacity to examine large data sets (large portions of a single sample’s genome at multiple coverage, or smaller areas of interest from multiple samples simultaneously in a batch). While these continue to improve in cost, ease of use, and throughput, and can be expected to open up new approaches of personalized medicine, it’s worth pausing to consider the “last generation” technology of capillary-based Sanger sequencing. Many MDx facilities retain and even continue to invest in infrastructure to support this approach. In this month’s article, we’ll review what this method is and why it’s still relevant, with some examples of applications where it is still the method of choice and unlikely to be displaced any time soon.
How Sanger sequencing works
First, let’s review how the approach works. Similar to polymerase chain reaction (PCR), Sanger sequencing requires that there be a short area of known sequence immediately adjacent to (within roughly one kilobase, or 1,000 base pairs of) the area of interest. This is for exactly the same reason it’s needed for PCR; a synthetic DNA primer of complementary sequence can be made to hybridize to this known starting point, and direct the in vitro activity of DNA polymerase in extending a new single DNA strand off the primer’s 3’ hydroxyl group. Note that since we know the sequence of this priming site, we could choose to make our synthetic primer complementary to either strand and thus face either way out from the known site. Regardless of which strand we design the primer to anneal to, the DNA polymerase’s activity in extending a new strand will be directed by the template strand it proceeds along—thus placing an A across from every template T, and a C across from every template G and vice versa.
These deoxynucleoside triphosphates (dNTPs) are selectively incorporated from a large pool of all four present in the reaction buffer, with the polymerase cleaving off two of each incoming dNTP’s phosphate groups as the energy source driving the reaction; each newly incorporated nucleotide in turn presents its 3’ hydroxyl for the polymerase to move forward one base and repeat the process on. The key in a Sanger sequencing reaction is that among each of the four dNTPs (dATP, dTTP, dGTP, and dCTP) in the buffer there is a very small proportion of an unnatural molecule, a dideoxy analog (ddATP, ddTTP, ddGTP, and ddCTP). DNA polymerases used for sequencing do not distinguish between dNTPs and ddNTPs, and will thus incorporate the ddNTP forms at a stochastic rate determined purely by their abundance relative to the native dNTP forms. The “dideoxy” name refers to the fact that these molecules lack the 3’ hydroxyl required to allow for continued DNA strand growth; they’re what’s referred to as a chain terminator, because once incorporated, the polymerase can proceed no further and the newly produced DNA strand is thus exactly as long as from the beginning of its primer, to the chain terminator ddNTP.
In a Sanger sequencing reaction, then, we end up with a mixture of polymerase extension products in which a very small fraction of the new DNA strands—all derived from the same template and starting location as defined by the primer—have selectively terminated at each base position along the template; and the terminating ddNTP identifies its template partner by the rules of hybridization mentioned above.
In a modern capillary-based Sanger sequencing system, we make use of this by having the foresight to have labelled each of the four ddNTP molecule types with a different, distinguishable fluorescent marker. Imagine, for instance, that all ddATP are “red,” all ddTTP are “green, all ddGTP are “black,” and all ddCTP are “blue.” Following conclusion of our reaction, we take this mixture of full-length new DNA product strands with their shorter, color-coded siblings and apply each reaction to one end of a very fine, transparent capillary filled with an electrically conductive gel or polymer.
Application of an electrical field across the length of the capillary with positive pole at the end opposite where samples are loaded causes electrophoresis, or the movement of the intrinsically negatively charged DNA strands toward the positive side. The conductive gel provides a frictional resistance against which the DNA strands migrate, with short strands suffering the least resistance (thus moving the fastest, and reaching the far capillary end first) and the longest molecules encountering the most resistance (moving the slowest, and reaching the far capillary end last). By using the proper combination of a very thin capillary, appropriate choice of gel or polymer fill in, and electric field parameters, it is possible to reliably separate new DNA strands of single-base length differences up to around 1,000 base pairs total length—thus our earlier observation that our initiating primer should be within about 1,000 base pairs of our area of interest.
The last step of the mechanism is the easiest to explain. An optical system is used to illuminate and record fluorescence signals (the “red,” “green,” “black,” and “blue”) in order as they reach the end of the capillary—and by doing so, it reads out the base identity of the ddNTP terminators at each base position one after another. An observation, for instance, of green, green, red, green, black, blue, green would correspond to “TTATGCT,” with the shortest (closest to primer) end read first.
Pluses and minuses
So what are the advantages and disadvantages of Sanger capillary sequencing as opposed to next-gen sequencing? Well, economics for a start. It’s true that on a per-base read, NGS is much cheaper than Sanger sequencing, but that’s only because of massive economies of scale. If you wish to look only at the DNA sequence of a single small area of a small number of samples, then the low per-reaction cost of Sanger sequencing is a clear winner over the very high per-reaction cost of an NGS approach, if it’s unable to amortize that cost over a huge number of samples in the reaction. The argument around labor use is identical, with NGS methods making sense only when spread across a very large amount of data to be obtained. Next, economics of the instrumentation comes into play, with capillary sequencer instruments being much cheaper from a capital-costs perspective than NGS equipment. Finally, use of a capillary sequencer is very simple compared to the workflows for next-gen systems.
Let’s now consider an example of a situation where use of Sanger sequencing makes sense. Imagine a case presenting with symptoms close to, or perhaps even identical with, a known metabolic disorder. Imagine further that this is a disorder known to arise from a particular mutation in a specific gene. This is an obvious candidate for a mutation-specific PCR assay, and chances are for your “disease of choice” matching this description, there’s such an assay out there that is cheap, fast, and accurate. Unfortunately when you run it on this case, it comes up as negative (wild type, normal allele, non-mutant). This highlights a shortcoming in PCR (or similar amplification-based) assays: they can only look for a known problem; they can’t look for ones you don’t know of.
Now in this case, while you could suspect that perhaps there’s a problem in another gene product nearby on the same biochemical pathway, it’s also possible that there’s a less common, or even currently novel, alternate mutation in the usual suspect gene. This is a classic example of a case in which Sanger sequencing can be a very powerful tool, allowing the lab to examine a small region of gene sequence around one known element in a single sample. Unlike PCR, Sanger sequencing of the gene will turn up any other and possibly unknown mutations in the region, as compared to referencing “normal” sequence. If the unexpected mutation is something like a frameshift or nonsense mutation, accepting this as the cause behind the symptoms may be relatively straightforward; if it’s a missense mutation where one amino acid code is substituted for another, interpretation may be more complex and require evaluation of the encountered mutation in a model system to ascertain if it’s causal.
An extension of this example occurs in the case of a condition which is known to arise—perhaps in slightly differing presentation—from a large number of possible mutations spread across a relatively small region of a single gene; a good example is cystic fibrosis and the CFTR gene. While the entire genomic sequence of the CFTR locus is more than 250,000 base pairs including exons and introns, the coding area (exons only) is much shorter, about 4,500 base pairs, and while some 90 percent of cystic fibrosis cases can be attributed to a single three-base pair deletion, more than one thousand mutations in the gene are known from studies. Running a thousand different specific PCR reactions is much less appealing than running even a few Sanger reactions to cover the coding region of the gene, so in a case like this, even if only a search for known mutations is to be considered, sequencing is much easier, faster, and cheaper than performing specific PCRs.
A further benefit of Sanger sequencing-based methods is hidden within this last example. That is, from an assay validation standpoint, Sanger sequencing is very simple; the approach is validated on one or a few sequences of interest, and doesn’t require extensive revalidation to be directed to a new target. Consider the work involved in developing, validating, documenting, training, and keeping reagents on hand for our hypothetical one thousand specific PCRs above; contrast that with the same work involved in setting up ~10 Sanger reactions (overlapping, and covering both strands for sake of completeness) to examine the entire CFTR coding region.
For all of these practical considerations, it’s easy to see why Sanger sequencing—although no longer the darling of the high-tech dream lab of the future—remains a powerful and cost-effective approach for some types of work commonly required of the MDx laboratory, and is likely to remain so for the foreseeable future.
John Brunstein, PhD, is a member of the MLO Editorial Advisory Board. He serves as President and Chief Science Officer for British Columbia-based PathoID, Inc., which provides consulting for development and validation of molecular assays.