CpG islands (CGIs) are a DNA sequence class conserved at vertebrate gene regulatory elements. A defining feature of CGIs is the lack of DNA methylation (5-methylcytosine, 5mC), an epigenetic modification associated with gene silencing. CGIs are exclusively studied in the context of hypermethylated vertebrate genomes, however it is unclear whether 5mC is the primary determinant of CGI regulatory function. Unlike vertebrate genomes, invertebrate genomes are typically sparsely methylated, thus the possibility of invertebrate genomes containing CGIs has not been considered. This study aims to establish whether CGIs are a vertebrate-specific innovation, or a deeply conserved feature of metazoan gene regulatory elements that exist independently of genomic 5mC content.
Non-methylated CpG island-like sequences (NMIs) were sequenced from eight invertebrate genomes using BioCAP-seq, a biochemical method based on protein affinity pulldown of CpG-rich DNA. We selected invertebrate genomes containing variable 5mC levels, ranging from the demosponge Amphimedon queenslandica to the chordate Branchiostoma lanceolatum. Analysis of invertebrate NMIs revealed close similarities to vertebrate CGIs identified experimentally and through sequence-based algorithms. Enriched BioCAP-seq signal was present at computationally predicted invertebrate CGIs, verifying the presence of CGI-like sequence features at invertebrate NMIs. Bisulfite sequencing and ATAC-seq confirmed NMI hypomethylation and association with accessible chromatin respectively. NMIs were predominantly localized to promoters and gene bodies. Promoter-associated NMIs contained methyl-sensitive and chromatin remodeling transcription factor binding motifs and were more highly conserved than non-NMI promoters (phastCons, p-value < 0.001). Finally, we examined the functional conservation of CGIs in invertebrates by validating the capacity of candidate NMIs to drive transgenic expression in the vertebrate zebrafish.
In summary, NMIs identified in sparsely methylated invertebrate genomes resemble CGIs in heavily methylated vertebrate genomes, challenging the long-standing assumption that 5mC determines CGI function. Elucidating the epigenetic factors necessary for CGI evolution provides valuable insights into the fundamental mechanisms controlling gene expression.