Once upon a time, genetics was simple. Lamarck told us giraffe necks got longer because they were stretching to reach higher leaves and their offspring gained these traits (although he was a bit hazy on the mechanism). Darwin replaced this with a more complex theory of evolution, and Mendel and his peas (bolstered by what could be politely called selective use of his data) provided evidence for some sort of physical hereditable material coding for these traits. People such as Delbrück; Avery/McCarty/MacLeod; Hershey and Chase; and Franklin, Watson, and Crick built on this to finalize the Central Dogma of DNA as the core hereditary material by means of triplet codons of sequence, transcribed to RNA as a transient messenger, and translated to amino acid sequences on the ribosome leading to full functioning proteins.
With each new iteration, the story of how genetic information is transmitted has gotten more complex and this trend continues to hold true. While the basic concept of the Central Dogma remains unchanged, we’re increasingly aware of modifications to DNA which, while not changing the underlying sequence, can influence the expression of its biological information content. Collectively known as epigenetic modifications, the one we’ll look at in more detail here doesn’t even modify the actual DNA bases, but the associated proteins involved in packaging of DNA. If you’re going to take 205 cm of nuclear DNA (208 for women) and cram it into an average human cell nucleus with a diameter of around 6 μm, you need to do some serious compaction. A first critical step in this involves the protein family known as histones. Carrying an overall basic charge (which helps to electrostatically associate with the acidic charge of the DNA backbone phosphate groups), histones H2A, H2B, H3, and H4 in groups of two each form small “spools” around which 146 base pairs of DNA coils 1.67 turns, followed by about 80 base pairs of unspooled DNA more loosely associated with histone H1 before the structure repeats with another histone octamer and its coil. Sometimes described as “beads on a string”, these structures—called nucleosomes—are the core packing motif for DNA, and by organizing them further through coiling and looping to create chromatin, individual chromosomes can be condensed down small enough to fit in the nucleus.
The closet rule: You can’t have both compact storage and ready accessibility
As anyone who’s ever tried to organize a closet has discovered, there’s a trade-off between compactness of storage and accessibility to material stored. This holds true for DNA as well; in order for genes to be transcribed, there are various regulatory element sequences which must be accessible enough for transcriptional enhancer proteins to identify their cognate DNA binding sequences, attach, and recruit in other factors to a gene’s promoter leading to productive association by an RNA polymerase. Production of transcript by the polymerase requires temporary local denaturation of the DNA strands to expose one as template (creation of the “transcription bubble”), another step which isn’t going to happen while the DNA is tightly coiled around a histone spool. It seems logical therefore that any biological processes which can influence the tightness of association between the DNA and histones might influence accessibility of genes for transcription.
One such process is histone acetylation. To understand how this works, we should first consider that rather than the histone octamer being a perfect spool, the individual subunits have dangling “tails” containing lysine residues. These tails wrap into the minor groove of the associated DNA, helping to bind and bend the DNA around the nucleosome core. Normally, the side chains of these lysines carry a terminal -NH3+ group and it’s these charges which contribute much of the electrostatic binding energy to the DNA backbone. In order to get the DNA free of the histone octamer and exposed enough for transcription factors to find their binding sites, this binding energy between the DNA and histones must be overcome. Put another way (and here, the ugly spectre of thermodynamics raises its head; despite best efforts to keep it at bay in this series, at times it’s inescapable)—there’s an equilibrium established under a given set of conditions (temperature, pH) between DNA bound to histones and DNA free of them; that equilibrium is generally in favour of the bound side.
HATs and HDACs
Enter a class of enzyme called a Histone acetyltransferase (HAT). These take the common acetyl group donor Acetyl-Coenzyme A and act to move its CH3COO- (acetyl) group over to the histone lysine tails, forming an amide bond. More importantly, this neutralizes the lysine side chain positive charge, leaving it electrostatically neutral. With that major component of binding energy removed, the equilibrium referred to above shifts to favor a larger proportion of free DNA. If you’re having trouble picturing this, think about the force needed to bend the DNA around the histone octamer; remove that force (electrostatic attraction from lysine side chains) and the DNA will tend to “spring” free. Left flopping about in the open, that DNA section is much more readily available to interact with proteins driving transcription. With this in mind, it’s now easy to see how acetylation of histones is associated with locally increased rates of gene transcription. (A less obvious but viable proposed secondary mechanism for histone acetylation to influence transcriptional activity is via direct protein-protein interactions, with they acetyl groups helping recruit transcription factors.)
If this is a biological system under control, then we should expect there’s also a system to reverse the acetylation. In fact, there is a class of enzymes known as Histone deacetylases (HDACs) which catalyze the hydrolysis of acetyl groups back off the histone lysines. One could now suppose that purely by altering the relative activities of HATs and HDACs in a cell, there should be a global (whole genome) influence on gene expression rates. While that’s likely true, it would be a crude method of regulation at best. What’s more interesting—and what gets more than a little strange—is where there are localized (and genetically transferable) variations in histone acetylation rates. In keeping with what we learned above, those areas with more histone acetylation are more transcriptionally active than those with less histone acetylation. It also turns out however that histone acetylation also acts to recruit HATs to the acetylated area, and being localized there, they have a greater chance to act on neighbouring, as-yet unacetylated histones. That is, histone acetylation begets more adjacent histone acetylation. Importantly, during chromosome replication, this means that HATs are recruited to the proximity of the nascent strand where the parental template is acetylated, leading to acetylation (and influence of transcriptional activity) on the progeny DNA. Thus, not only is the underlying DNA sequence inherited between generations, but this form of epigenetic marker can be, too. Actually, studies have shown that it can be passed down multiple generations.
Consequences of this are significant. To understand a functional human genotype, not only must you know the direct DNA sequence at both copies of a normal somatic gene locus, but the differential acetylation of these copies can influence their relative expression rates. Suddenly, those observations lumped in genetics courses as “variable penetrance” or “variable expressivity” begin to have additional plausible mechanisms. We can also begin to appreciate that undesirable changes in HAT or HDAC activity—such as in response to environmental chemicals—can act a mutagens resulting in hereditable genetic changes while leaving the base DNA sequence unchanged. Once more, our understanding of what exactly constitutes heritable genetic information has grown a little more complex than that accepted by our predecessors.
Real-life effects
That’s all very interesting, but are there known clinical conditions relating to histone acetylation? In a word, yes. Although, many details are still being worked out. For example, there are actually 11 different HDAC genes conserved among mammals, in four groups based on structure and localization. In animal models, deletion of any of the Group I HDACs is embryonic lethal and deletion of a Group 2 HDAC has negative impacts on particular organ types (including heart and skeleton); Group 3 and 4 HDACs appear less critical for development in these models. Specificity of phenotypes arising from particular HDAC deletions suggests they have particular and reproducible patterns of activity, as opposed to being generic and global. On the other side of the enzymatic process, Rubinstein—Taybi Syndrome is caused by mutations in CREB binding protein (CBP), a protein with intrinsic HAT activity. Analysis of these mutations indicate they destroy the protein’s HAT activity and that in turn may be the root cause of the clinical presentation. Mouse models of this, with specific deletion of CBP’s HAT function, show a phenotype related to impairment of long term (but not short term) memory formation—rescuable by administration of a HDAC inhibitor, and indicating an intriguing and perhaps surprising link between chromatin packaging and long-term memory. Multiple studies have linked dysregulation of histone acetylation with Alzheimer’s disease (AD) in humans, and treatment of mouse AD models with HDAC inhibitors have proven beneficial in reducing symptoms.
If these leads prove out to be clinically relevant in humans, a challenge for the molecular labs may be in how to measure or assess histone acetylation. Other epigenetic modifications such as base methylation directly modify the DNA bases and are detectable by current NGS sequencing (either through a chemical process known as bisulfite modification in sequence-by-synthesis methods, or directly by analysis of raw data in nanopore based methods). NGS methods however only examine the actual nucleic acids, discarding proteins (including histones) during sample preparation. Should histone acetylation become something needing to be measured in the lab, a different approach would be required. Targeted proteomics would seem the most promising method but would need to be applied on appropriate tissue samples—as opposed to nuclear genomic analysis which can be conducted off of any readily available tissue and yet represent whole organism.
Conclusion
The moral of this month’s topic would seem to be that while our knowledge of genetics and inheritance is broad and, in many ways, actionable (or at least provides understood mechanisms for observed phenotypes), genome sequence alone isn’t the whole story. There remain increasingly complex cellular mechanisms which can modulate base genetic data in heritable fashion that we are only now beginning to grasp.