Featured image: qimono/pixabay
Squinting through his microscope at salamander cells, late-nineteenth-century biologist Walther Flemming spotted a curious substance deep inside the cells’ nuclei that selectively soaked up the stain he was using. The stuff looked like a skein of wool — until, that is, a cell underwent division. Then the skein separated into fatter, discrete threads: the chromosomes, Greek for “color bodies.”
Flemming’s scientific descendants are still squinting, but with ever more powerful molecular, imaging and computational tools. In a collective global effort, hundreds of researchers are now piecing together the three-dimensional architecture of the nucleus’s entire allotment of chromatin — our DNA and its associated proteins — across space and time.
Driving the project are questions like these: How does our DNA pack itself so neatly within the cell’s tiny nucleus? How does it pack even tighter when it’s time for a cell to divide, and uncoil at just the right spots and moments in different cells to control, with precision, the activity of our 20,000-plus genes?
A major push in this global effort, dubbed the 4D Nucleome program, was initiated by the US National Institutes of Health in 2015 out of a growing realization that parsing the 3-D architecture of the genome will be crucial for answering myriad questions about gene control across the human lifetime, in health and in disease. Add how this 3-D architecture changes with time and you get the fourth dimension. With the recent start of its second phase, the effort’s overall funding amounts to some $280 million, involving dozens of research projects and hundreds of scientists. “If you want to understand how a genome works, or even a chromosome, you have to understand its three-dimensional structure,” says chromosome biologist Job Dekker of the University of Massachusetts Medical School.
The packing problem invites a raft of fundamental questions, says molecular biologist and immunology researcher Ananda L. Roy, the 4D Nucleome project’s program leader. Consider that the roughly two meters’ worth of double-helical DNA in each cell condenses 200,000- to 250,000-fold to fit in the nucleus, which has a diameter of 8 to 10 millionths of a meter or so. “How do you do this folding?” Roy asks. “Does it have a meaning? How is that related to health? Is the genome folding the same way in all cells? How does it change in time?”
The answers, some scientists are betting, will offer novel medical leads. Through studying genome geometry, they think they can uncover ways to develop new categories of treatments, ones that heal by tweaking genome architecture to reestablish health-promoting patterns of gene activity.
Of genetic letters, coils and loops
The scientific community cheered in 2003 when the decade-long, $2.7-billion Human Genome Project delivered the entire linear sequence of the 3 billion DNA letters, or nucleotides — adenines, guanines, cytosines and thymines — that make up our 23 pairs of chromosomes.
It was a momentous achievement, but also an eclipsing one. Though Dekker cheered too, he also says that a fixation on genetic sequencing for decades distracted researchers from the importance of genome geometry and the role it might play in determining which genes are active and which are silent.
“It’s the most amazing thing,” he says. “You have all of this DNA inside one nucleus. It’s a very crowded place. How can it keep everything from becoming entangled … when you think that even if you have your headphones in your pocket, the thing is a total mess every time you take it out of your pocket?”
If it were up to DNA itself, the genome would clump into the mother of all tangles. After all, stuffing the two meters’ worth of a genome’s DNA into a nucleus eight micrometers in diameter is akin to stuffing a 7,500-meter length of spider-silk thread within the confines of a walnut.
But Flemming’s chromatin is more than DNA: It’s an intimate assembly of DNA and proteins, particularly ones called histones. Together, these form minuscule spools that have stretches of DNA some 150 nucleotides long wound around them, like molecular fishing lines. In the genome there are millions of these DNA-wound spools, known as nucleosomes, separated by short, naked stretches of DNA to give the look of beads threaded onto a string.
The nucleosome wrappings get you a several-fold compaction of nuclear DNA, but not the 250,000-fold shrinkage Roy speaks of. That takes additional contortions. Nobody knows all of the details of the compaction process, but one way to think of it is to imagine clasping the ends of a string between thumb and forefingers of each hand and then twisting like crazy. The string undergoes multiple coilings and bucklings until it all fits into a dramatically shorter diameter compared to its original outstretched length. The chromatin’s version of such compaction and buckling yields, among other structural features, fibers about 700 nanometers wide, which correspond to the spaghetti-like structure Flemming and his contemporaries observed nearly 150 years ago in non-dividing cells.
But there’s order, too, among all the coils. Researchers know that even during most of a cell’s life, when chromosomes exist in these spaghetti-like forms (as opposed to the super-condensed, stubby chromosomes you see when a cell is dividing), each strand sits in the nucleus in its own discrete territory. They glom together like so many skeins of wool, snuggling close but remaining untangled and neatly separable during division.
Nor is the DNA of individual chromosomes coiled up willy-nilly. Within each chromosome, reiterated many times over, are DNA loops — often termed “topologically associated domains,” or TADs — that are crucial for the genome’s proper functioning — that is, for its pattern of gene activity.
Genome researchers are amped up today by their growing ability to explore and map out the details of this schema. It’s taking their understanding of molecular biology to new depths and opening new medicinal pathways.
Genome, express thyself
One of the Human Genome Project’s surprises was the revelation that our DNA hosts only about 20,000 genes, constituting a mere 1 to 2 percent of the genome’s overall length. Researchers devoted to revealing how the 3-D and 4-D genomes work want to know what the other 98 percent of the genome is doing, and how it helps control the activity of the gene-bearing 2 percent.
Plenty of control is needed: One of the most beguiling questions about the genome is how a human being’s different cell types — from neurons to immune cells to muscle cells — all share the same DNA yet have distinct biological personae.
“What is the reason that all of these cell types exhibit different structures, functions and activities?” says molecular biologist Bing Ren of the University of California, San Diego, a participant in the 4D Nucleome project and coauthor of an overview of 3-D genome architecture in the Annual Review of Cell and Developmental Biology. That architecture, it turns out, is key to determining which genes turn on and off and when and where they do so.
TADs may be key to the process. They can be millions of genetic letters in length down to tens of thousands: In a mouse embryonic stem cell, there are some 2,200 of them with an average size of almost 900,000 letters. The TAD count in humans might be more like 15,000, says biologist Richard Young of the Whitehead Institute and MIT. Within each TAD (Young also refers to them as insulated genomic domains, or IGDs) reside specific genes, along with DNA segments that control them: promoters, enhancers and insulators.
A TAD’s loop-like structure is crucial, as it can bring together DNA segments that would otherwise be far apart if that same piece of DNA were stretched out. So promoters and especially enhancers can appear very distant from the gene they activate yet snuggle close when viewed through a 3-D lens — helping to solve a conundrum that geneticists scratched their heads over for decades.
But how are the TADs created to begin with? It could be through something called loop extrusion, a process whose untangling has been one of the most important achievements of the past few years, says biophysicist Erez Lieberman Aiden, head of the Center for Genome Architecture at Baylor College of Medicine in Houston. It turns out that small teams of proteins collaborate to form multitudes of loops rooted at locations on the genome demarcated by specific DNA sequences. The protein teams form bolo-like structures at these genomic signposts, and DNA gets extruded through them, creating the loops. Any genes, promoters, enhancers or other regulatory elements such as insulator segments within a given loop of DNA are thereby brought into proximity with each other, enabling appropriate genetic control.
Woven into this dynamic of genome structure and control is yet another spectacularly complex layer: the epigenome, chemical marks that are added to the chromatin and influence gene activity. Some of these modify histone proteins in ways that tighten or loosen the local chromatin — thereby puffing out and exposing genes for activation, or coiling them yet tighter and shutting them down. Others —methyl groups — stud stretches of DNA and render silent any genes in these locations.
Bringing the genome’s three-dimensional structure to light has required a workshop of observational tools and techniques. Some of the greatest leaps in structural insight have come by way of microscopy-based imaging and methods known as chromosome conformation capture (3C).
In the 3C methods, which Dekker helped to pioneer in the early 2000s and which he and many others have built upon since then, researchers chemically link those places in the genome where bits of DNA lie near each other inside the cell’s nucleus. Then, using DNA sequencing methods and computational techniques, they produce “contact maps” that depict thousands upon thousands — now even millions — of places where genomic pieces just about touch. From such maps emerges a sense of the genome’s three-dimensional conformation and how it changes during a cell’s life cycle and in response to stimuli such as hormones.
Early 3C methods revealed contacts only at stretches of DNA that researchers had preselected for study. Since then, Dekker and others have devised increasingly powerful and revealing variations on the theme, some with cute names. One workhorse among these is Hi-C; in 2017 a research group used it to identify almost 2 million unique contact points in egg-producing oocytes of mice. Another is ChIA-PET (chromatin interaction analysis by paired-end tag), which can identify interactions between promoters and other gene-regulating players where proteins called transcription factors attach and help turn genes on or off.
Other powerful genome insights come from a microscope-based method known as FISH, short for fluorescence in situ hybridization. To a cell, scientists add fluorescent probes that attach to specific DNA sequences; once in place, the probes serve as tiny beacons visible with a microscope. By placing and observing the beacon in different genome locations in experiment after experiment, scientists get a composite picture of the genome’s structure the way lights on a Christmas tree reveal the shape of the tree. “This allows you to trace the genome in 3-D,” Aiden says. “This is a transformative capability.”
Test-driving the genomic machine
As fundamental discoveries about genome structure and expression pile up, research momentum has been building. It’s now spawning biomedical applications and business ventures.
“This field is very vibrant,” Ren says. “It’s like a supernova where new stars, new planets, are being formed.” He points to a November 2019 report in Science as an example of what these supernova progeny look like. In it, a team of 26 researchers, Ren included, chronicle how they refined their understanding of variants of genes — called risk variants — linked with late-onset Alzheimer’s disease. Many such risk factors have been found for sundry diseases, often in the 98 percent of the genome that doesn’t contain genes. But precisely what they do is seldom understood.
In the work, Ren and colleagues used techniques that identify the more loose and open regions of chromatin, spots more likely to be genetically active. They examined the patterns in four different types of brain cells: neurons, astrocytes, oligodendrocytes and microglia. The team found, first of all, that the airier and more genetically active chromatin locations differed between the four cell types. They also found that the different cell types use distinct enhancers even when controlling the same gene.
Most tantalizing of all, the researchers saw that Alzheimer’s risk variants largely reside inside enhancers that are specifically used in microglia. The strong implication here is that these risk variants are altering the enhancers’ control over gene activity in ways that raise the risk of Alzheimer’s.
This is intriguing, Ren says, because microglia are cells that clean up cellular debris, including proteins that otherwise build up in the brain and are associated with Alzheimer’s. Malfunctioning microglia have long been implicated in the disorder, and this finding adds heft to the case.
Insights like this weren’t possible before the new tools came along, Ren adds. “We realize now that accessibility of enhancers — meaning if the chromatin is open enough — is being highly regulated in a cell-type-specific manner. That is why you have cell-type-specific gene expression.”
Ren would next like to know whether one could develop drugs for serious diseases to coax abnormal chromatin conformations back to healthy, disease-free patterns. And so would Young of the Whitehead Institute and MIT. Young is convinced that the genome’s 60,000 or so enhancer sequences, and their spatial relationships to its 20,000 genes within its 15,000 or so TADs, are a next big thing in pharmaceutical innovation. He has co-started several companies to pursue this genomic perspective in search of new medicines.
One of them, launched in 2016 with colleague Leonard Zon of Harvard Medical School, is CAMP4 — named after the final encampment for Mount Everest climbers before they start for the summit. Its aim is to identify TADs and other gene regulatory elements involved in a given disease, then apply machine learning to design drugs that could recalibrate gene activity patterns gone awry.
Another of Young’s companies, Omega Therapeutics, is zeroing in on the loop-extrusion process that generates the TADs (or IGDs) to begin with. The notion here is to design “controller” molecules that can reengineer the size and location of TADs — thus changing the genomic neighborhood and packing or opening up chromatin in ways that tamp down disease.
“This is fundamental science about how you fold up the blueprint for life,” Young says. “If you don’t fold it up properly, all hell breaks loose.”
When Flemming published his 1880 paper in which he coined the term chromatin, about all he could say about the genetic material he was observing was that it took on the colors of his aniline stain while other, “achromatic” substances in the nucleus did not. The tools of the time precluded him and his research brethren from discerning the deeper dramas surely unfolding in the nucleus. Now, 140 years later, these wish-list tools are here and in the hands of those just as driven to plumb the genome’s mysteries.
“This,” Aiden says, “is the task of my generation of scientists.”
Ivan Amato is a science writer, podcaster and science café host based in Hyattsville, Maryland.
This article originally appeared in Knowable Magazine on October 12, 2020. Knowable Magazine is an independent journalistic endeavor from Annual Reviews, a nonprofit publisher dedicated to synthesizing and integrating knowledge for the progress of science and the benefit of society. Sign up for Knowable Magazine’s newsletter.