Enhancers are segments of DNA that increase the expression of a nearby gene. This post provides an overview enhancer biology and detection of enhancers at scale.
This is the first post in a series called Genomics for Statisticians. In this series I will explore some important ideas in modern genomics from the perspective of a non-biologist. When I began research in statistical genomics, I quickly discovered that my knowledge of genomics was not quite up to par. This series is the result of my effort to fill that gap. I hope other researchers in genomics who do not have extensive formal training in biology (e.g., statisticians, computer scientists) will find this series helpful as well. The posts should be reasonably self-contained. My goal is to explore concepts at the level of a college biology class, roughly. The first post in the series in on enhancers.
Introduction
Enhancers are short regions of DNA that increase the expression of a nearby gene.
Most variants responsible for heritability fall outside genes and in enhancers. Therefore, enhancers are important to study and understand.
This post has three parts. First, we review necessary background information on chromatin and transcription. Second, we review enhancer biology. Finally, we review methods for identifying enhancers and linking enhancers to genes at scale.
Review of chromatin and transcription
Structure of DNA, chromatin, and chromosomes
DNA is a double-stranded molecule that stores genetic information. A strand of DNA consists of a bunch of linked nucleotides. Nucleotides come in four types: A, T, G, C.
DNA molecules do not exist on their own in the nuclei of cells. Instead, they are packaged with proteins called histones to form chromatin. Put simply, chromatin = DNA + histones.
There are 5 histone proteins: H1, H2A, H2B, H3, and H4.
A nucleosome is a chromatin structure consisting of a DNA strand wrapped twice around a nucleosome core, anchored in place by an H1 protein. A nucleosome core consists of two copies each of H2A, H2B, H3, and H4.
A nucleosome, which consists of an 8-histone core, a strand of DNA wrapped twice around the histone core, and an H1 histone to help anchor the DNA in place.
Nucleosomes pack together in a highly organized way to form chromosomes.
Nucleosomes are arranged to form chromosomes.
Changes in chromatin structure
DNA in tightly-coiled chromatin cannot be accessed by proteins.
Chromatin can “loosen up,” thereby exposing DNA to proteins.
Chromatin “loosens” or “tightens” by three main processes:
Chromatin remodeling: Proteins called chromatin remodeling complexes physically reposition nucleosomes to expose DNA.
Histone modification: Histones are chemically altered by addition or removal of functional groups.
Example: an acetyl group (CH\(_3\)CO) can be added to the 27th lysine amino acid of H3 to loosen the chromatin. In chemical notation, we write H3 \(\to\) H3K27ac (K stands for “lysine,” and “ac” stands for acetyl group).
Example: a methyl group (CH\(_3\)) can be added to the 4th lysine amino acid of H3 to loosen the chromatin. In chemical notation, we write H3 \(\to\) H3K4me1 (“me1” stands for methyl group).
DNA methylation: Cytosine bases of DNA also can be methylated to form 5-methylcytosine. Heavily methylated DNA is generally not transcribed.
Transcription of genes
Transcription is the copying of a gene into a complimentary RNA molecule.
A promoter is a short sequence of DNA upstream of a gene that helps initiate transcription.
The protein polymerase and other proteins called transcription factors and transcription activator proteins bind to the promoter.
Like a train traveling down railroad tracks, polymerase travels from the promoter into the gene and synthesizes a complimentary strand of RNA.
Review of enhancer biology
What is an enhancer?
An enhancer is a short sequence of DNA that increases the transcription of one or more nearby genes through physical contact (known as a cis-regulatory mechanism).
How do enhancers work?
Enhancers bind transcription factors and transcription activator proteins and bring these proteins close (in physical space) to the promoter of the target gene. This action modulates the expression of the target gene.
Enhancers can regulate expression of their target gene in other ways. For example, some enhancers regulate the release of paused polymerase, and others act through RNA splicing mechanisms.
An enhancer modulating the expression of a gene.
What are some features of enhancers?
Enhancers are located in regions of open (or “loose”) chromatin.
Enhancers are flanked by histones carrying H3K27ac and H3K4me1 modifications.
Enhancers are flanked by methylated DNA.
Enhancers are a few hundred base pairs in length and act over \(\approx\) 30,000 bases.
Genes can be affected by a single enhancer or multiple, “interacting” enhancers; conversely, individual enhancers can regulate multiple genes.
Enhancers can reside in clusters of hundreds of enhancers, forming “super enhancers.”
Methods for enhancer identification at scale
DNA sequence analysis: Enhancers harbor transcription factor binding motifs (i.e., very short stretches of DNA that help bind transcription factors). Furthermore, enhancers are often conserved across species. Thus, we can try to predict whether a region of the genome is an enhancer simply by looking at the corresponding primary sequence.
Biochemical annotations: As we have discussed, several biochemical annotations correlate with enhancer activity. Genome-wide, we can search for (i) histone modifications (e.g., presence H3K27ac and H3K4me1), (ii) transcription factor binding, (iii) open chromatin, (iv) DNA methylation, and (v) the initiation of transcription. Many assays, both bulk-tissue and single-cell, provide us with this information (see, for example, DNase-seq, Pro-seq, and ChIP-seq).
eQTL mapping: For a given SNP and given nearby gene, we can test whether expression of that gene differs significantly across the levels of the SNP. If so, the SNP may lie within an enhancer for that gene. A downside of this approach is that it operates at the resolution of linkage disequilibrium blocks.
3D conformation mapping: There exist assays (e.g., Hi-C) to probe the 3D conformation (in space) of a chromosome. If a given region of DNA is in close proximity to a known promoter, then that region might be an enhancer.
CRISPR-based approaches: See next section.
CRISPR-based methods for enhancer identification at scale
CRISPR-Cas9 is a flexible technology for genome perturbation and editing.
The basic idea is to use CRISPR to “perturb” (in some way) a short sequence of DNA. If this perturbation is associated with a shift in the expression of a nearby gene, then the targeted sequence of DNA is likely an enhancer for that gene.
CRISPR allows us to link enhancers to their target genes and (possibly) establish rigorous causal relationships between enhancer activity and gene expression.
CRISPR-based approaches differ along two main axes:
Number of genes measured: Some CRISPR-based protocols track only a single gene; others track all genes (through, e.g., scRNA-seq).
Mode of perturbation: CRISPR can “perturb” a candidate enhancer in several ways:
CRISPR–Cas9 (nuclease active) makes a single cut inside the enhancer. The cell uses non-homologous end joining to repair the cut. Generally, a few extra bases are inserted or deleted during this process, altering and consequently deactivating the targeted enhancer.
CRISPRi (short for CRISPR interference) “turns off” a candidate enhancer without altering its associated DNA sequence.