Chemical-assisted analysis of epigenetic modifications

Xucong Teng ab, Qiushuang Zhang c, Yicong Dai bc, Hongwei Hou b and Jinghong Li *abc
aCenter for BioAnalytical Chemistry, Hefei National Laboratory of Physical Science at Microscale, University of Science and Technology of China, Hefei 230026, China. E-mail: jhli@mail.tsinghua.edu.cn
bBeijing Life Science Academy, Beijing 102209, China
cNew Cornerstone Science Laboratory, Department of Chemistry, Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology, Tsinghua University, Beijing 100084, China

Received 30th April 2025

First published on 30th June 2025


Abstract

Epigenetic modifications, particularly those occurring on nucleic acid bases, play a pivotal role in regulating gene expression and cellular function without altering the underlying nucleic acid sequences. These subtle chemical alterations, such as methylation, hydroxymethylation, and acylation, are intricately linked to various biological processes. The analysis of base modifications poses significant challenges because of their minimal structural differences from unmodified bases, which traditional methods relying on double-stranded complementarity often fail to distinguish effectively. Nevertheless, the distinct chemical properties conferred by these modifications provide an opportunity for the development of novel approaches for their specific recognition. In this review, we elucidate the biological significance of nucleic acid modifications, including their diverse types, genomic distribution, abundance, and functions. We then delve into the principles and applications of chemical-assisted analysis methods, which leverage the unique chemical properties of modified bases to transform them into detectable derivatives. We comprehensively discuss various base conversion strategies, encompassing oxidation, reduction, deamination, addition, substitution, and coupling reactions. Moreover, we address the limitations of current chemical-assisted methods, such as insufficient sensitivity for low-abundance modifications, stringent reaction conditions, variable conversion efficiencies, challenges in single-cell analysis, and the loss of spatial information. Finally, we emphasize the significance of nucleic acid modifications in unraveling biological processes and disease mechanisms, and highlight the potential of chemical-assisted methods in advancing epigenetic research and precision medicine.


image file: d5cs00479a-p1.tif

Xucong Teng

Xucong Teng received his Bachelor's degree from Tsinghua University in 2016 and received his PhD degree in 2022 under the supervision of Prof. Jinghong Li at Tsinghua University. He then worked as a postdoctoral research fellow at Tsinghua University from 2022 to 2024. Currently, he is an associate researcher at the University of Science and Technology of China. His research focuses on employing nucleic acid biochemistry, single-cell analytical chemistry to explore gene expression regulation networks.

image file: d5cs00479a-p2.tif

Qiushuang Zhang

Qiushuang Zhang received her Bachelor's degree in Chemical Biology from Department of Chemistry, Tsinghua University in 2021. Currently, she is pursuing her PhD under the supervision of Prof. Jinghong Li at Tsinghua University. Her research focuses on imaging and spatial-omics technologies of RNA modifications.

image file: d5cs00479a-p3.tif

Yicong Dai

Yicong Dai obtained her Bachelor's degree from Beijing Normal University in 2018 and received her PhD degree in 2023 under the supervision of Prof. Jinghong Li at Tsinghua University. She then worked as a postdoctoral research fellow at Tsinghua University. Her current research focuses on single-cell imaging and G-quadruplex chemical biology.

image file: d5cs00479a-p4.tif

Hongwei Hou

Hongwei Hou is currently a principal investigator at the Beijing Life Science Academy, China. He received his PhD in 2005 from University of Science and Technology of China. His current research focuses on high-throughput technologies in chemical biology.

image file: d5cs00479a-p5.tif

Jinghong Li

Jionghong Li is an Academician of the Chinese Academy of Sciences and a Cheung Kong Professor in the Department of Chemistry at Tsinghua University. He received his BSc in 1991 from the University of Science and Technology of China and his PhD in 1996 from the Changchun Institute of Applied Chemistry, Chinese Academy of Sciences. His current research interests encompass a broad range of fields, including bioanalytical chemistry, chemical biology, bioelectrochemistry, physical electrochemistry, electrochemical materials science, and nanoscopic electrochemistry.


1. Introduction

Epigenetics refers to the mechanisms that govern the biological processes of DNA and RNA through chemical modification without altering the underlying nucleic acid sequences, thereby establishing the “second code” for genetic information transfer. As one of the key regulatory mechanisms in epigenetics, nucleic acid modification dynamically regulates gene expression patterns by inducing specific chemical alterations in DNA or RNA bases (such as the conjugation of methyl, hydroxymethyl, formyl, and acetyl groups). In both DNA and various RNAs, more than 170 distinct chemical modifications have been identified, constituting the complex epigenome and epitranscriptome.1–3 These modifications play indispensable roles in fundamental biological processes, such as maintaining genome stability, regulating transcription and translation, by modifying the structure of nucleic acids, as well as influencing interactions between nucleic acids or their binding with proteins.

Abnormal nucleic acid modifications have been verified to be closely related to various human diseases. For instance, the abnormal distribution of 5-methylcytosine (5mC) in DNA is closely associated with cancers.4 Dysregulation of N6-methyladenosine (m6A) modifications in RNA has been shown to play a critical role in the pathogenesis of neurodegenerative diseases, metabolic disorders, and various malignant tumors.5 The absence of pseudouridine (Ψ) modification can lead to mitochondrial dysfunction.6 Additionally, overexpression of the ac4C writer NAT10 facilitates tumor metastasis.7 Hence, research aimed at revealing the mechanisms of nucleic acid modification-related diseases and discovering therapeutic targets has increasingly significant value. A primary objective of these studies is to confirm the presence of RNA modifications. Therefore, the development of highly sensitive and specific methods for detecting nucleic acid modifications has become essential for clarifying their biological function and clinical value.8,9

Methods relying on complementary base pairing, such as polymerase chain reaction (PCR) and fluorescence in situ hybridization (FISH), are highly effective for the precise identification and detection of various nucleic acid molecules with diverse sequences. Nevertheless, the structural differences between modified and unmodified bases in nucleic acids are often exceedingly subtle. Most naturally occurring modified bases within cells can still engage in Watson–Crick base pairing and act as templates for nucleic acid synthesis. For instance, once cytosine (C) in DNA undergoes 5′-methylation modification, it retains its ability to act as a template for DNA replication, producing accurate replication products. Hence, the efficient and precise discrimination of modified versus unmodified bases represents a significant challenge and a key issue in nucleic acid modification analysis.

With the rapid progress of epigenetics and epitranscriptomics, numerous methods for detecting RNA and DNA modifications have emerged in recent years. After nucleic acids are digested into single nucleotides or deoxynucleotides by a nuclease, liquid chromatography–mass spectrometry (LC–MS) can precisely identify the type of modification and quantify its abundance based on differences in the molecular weights of different modified bases. However, LC–MS is unable to provide sequence and locus information.10 Antibody-based affinity enrichment approaches, such as m6A RNA immunoprecipitation and sequencing (RIP-seq)11 and m5C RIP-seq, can capture nucleic acid fragments containing base modifications, but are restricted by antibody preferences and insufficient specificity. In contrast, chemical-assisted methods are based on the differences in chemical properties between the modified bases and their unmodified counterparts, enabling the efficient conversion of modified bases into derivatives with distinctive detection characteristics. After chemical conversion, the final signal readout is achieved through techniques such as PCR,12 Sanger sequencing,13 next-generation sequencing.14 Approximately 20 important nucleic acid modifications can be detected through chemical-assisted approaches, which offer a methodological foundation for advancements in the fields of epigenetics and epitranscriptomics.15–17 Nevertheless, these methods still encounter numerous challenges such as insufficient sensitivity for detecting certain modified bases, low reaction efficiency and stringent reaction conditions.18,19 To satisfy the research demands in this field, the development of novel chemical-assisted strategies remains an urgent necessity.

This review provides a concise overview of the characteristics and functions of major nucleic acid modifications, as well as their specific recognition and signal readout strategies. Subsequently, we concentrated on the research progress in chemical-assisted identification of various types of base modifications, highlighting the fundamental principles and performance of differentiating between modified and unmodified bases. Furthermore, we systematically evaluate the limitations of current chemical-assisted methods in the analysis of nucleic acid modifications and discuss potential future development directions to advance epigenetic research.

2. The biology of nucleic acid modifications

To date, over 170 nucleic acid modifications have been identified.3,20 Among these, approximately 20 types are known to play significant roles in biological processes and have thus been the focus of extensive research. These crucial nucleic acid modifications, such as 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), 5-carboxylcytosine (5caC), N4-acetylcytidine (ac4C), 3-methylcytidine (m3C), N6-methyladenosine (m6A), 1-methyladenosine (m1A), N6-isopentenyladenosine (i6A), inosine (I), 7-methylguanosine (m7G), N2-methylguanosine (m2G), N2,N2-dimethylguanosine (m22G), pseudouridine (Ψ), dihydrouridine (D), are involved in numerous physiological and pathological processes. The biological functions of these modifications have already been comprehensively reviewed.2,18,20–27 Here, we provide a concise overview of these major nucleic acid modifications, including their molecular structural characteristics, genomic distribution, intracellular abundance, and biological functions (Table 1).
Table 1 Characteristics of major nucleic acid modifications
Modification Nucleic acid type Distribution Abundance Major biofunctions Ref.
5mC DNA 5mC exists in 70–80% of CpG islands. 1% 5mC regulates chromatin structure and gene silence. 3 and 4
5hmC DNA 5hmC exists in promoter regions. Low 5hmC is a relatively stable oxidation product of 5mC during DNA demethylation process. It participates in epigenetic regulation, and can be used as a tumor biomarker in liquid biopsy. 4
5fC DNA 5fC exists in promoter regions. Low 5fC, a further oxidation product of 5hmC during DNA demethylation process, is unstable. It participates in epigenetic regulation and influences gene expression. 4
5caC DNA 5caC exists in promoter regions. Low 5caC, a further oxidation product of 5hmC during DNA demethylation process, is unstable. It participates in epigenetic regulation and influences gene expression. 4
m5C RNA m5C exists in tRNA, rRNA, lncRNA and mRNA, especially in regions surrounding start codons. 0.02–0.09% m5C modulates mRNA stability, localization and translation, lncRNA stability and localization, and protects tRNA from cleavage and decoding. 28
hm5C RNA hm5C exists in tRNA, lncRNA and mRNA, especially in regions surrounding start codons. Low hm5C regulates mRNA nuclear export. 29
f5C RNA f5C exists in tRNA, lncRNA and mRNA, especially in regions surrounding start codons. Low f5C regualates mitochondrial mRNA decoding. 30
m3C RNA m3C exists in tRNA. Low m3C regulates tRNA stability. 31 and 32
ac4C RNA ac4C exists in tRNA, rRNA, and the 3′-UTR and CDS regions of mRNA, especially in regions surrounding start codons. 0.01–0.1% ac4C regulates mRNA translation and stability. 33
m6A RNA m6A is abundant in lncRNA and mRNA, especially in regions surrounding stop codons. 0.1–0.4% m6A regulates mRNA splicing, transport, translation and degradation, and lncRNA splicing modulation and nuclear export. 5 and 34
m1A RNA m1A exists in tRNA and the 5′-UTR regions of mRNA. 0.01–0.05% m1A affects mRNA decay, stability and translation, and tRNA decay. 35
i6A RNA i6A exists in tRNA. Low i6A regulates base pairing discrimination between tRNA and mRNA, translational efficiency and fidelity. 36
I RNA I exists in lncRNA, tRNA and the 3′-UTR regions of mRNA Low I regulates gene expression, mRNA structure, protein diversity and decoding. 37 and 38
m7G RNA m7G exists in tRNA and the CDS and 3′-UTR regions of mRNA. 0.002–0.05% m7G regulates RNA translation, processing and export. 39 and 40
m2G RNA m2G exists in tRNA, tsRNA and sncRNA. Low m2G regulates tRNA folding. 41 and 42
m22G RNA m22G exists in tRNA, tsRNA and sncRNA. Low m22G regulates tRNA folding. 41 and 42
Ψ RNA Ψ is abundant in rRNA, tRNA and the CDS and 3′-UTR regions of mRNA. 0.2–0.6% Ψ affects mRNA translation and tRNA stability. 43
D RNA D exists in tRNA and mRNA. High D affects RNA secondary structures. 44 and 45


2.1 DNA modifications

DNA cytosine methylation plays a pivotal role in regulating gene expression in mammalian cells and is one of the most extensively studied epigenetic modifications.3 5mC is formed through the enzymatic transfer of a methyl group to the 5-carbon position of cytosine, catalyzed by DNA methyltransferases (Fig. 1) and is predominantly enriched in CpG dinucleotide regions,46 commonly referred to as CpG islands (CGIs), within gene promoters. In the mammalian genome, 60–80% of cytosine residues within CGIs are methylated.47 The presence of 5mC hinders the binding of transcriptional activation-related proteins to DNA and is typically associated with gene silencing.48 Besides regulating gene transcription, DNA methylation also plays key roles in imprinted genes and X chromosome inactivation.4 5mC can be oxidized to 5hmC by intracellular TET family oxidases and further oxidized to 5fC and 5caC46 (Fig. 1). 5fC and 5caC can be actively removed by thymine DNA glycosylase (TDG) through base excision repair (BER),46 thereby restoring cytosine to its unmethylated state. Recent studies suggest that although 5hmC, 5fC and 5caC are not highly abundant, they can exhibit stable existence to a certain extent and possess distinct epigenetic regulatory functions.4
image file: d5cs00479a-f1.tif
Fig. 1 Cytosine modifications in DNA and their interconversions. DNMT1 and DNMT3A/B are DNA methyltransferases. TET, ten-eleven-translocation enzyme. TDG, thymine DNA glycosylase. BER, base excision repair. Red highlights represent modification groups on the bases.

2.2 Adenine modifications of RNA

N 6-Methyladenosine (m6A) (Fig. 2) is the most prevalent and well-characterized modification type in mammalian mRNAs.5 This modification is not only widely present in mRNAs but also exists in various noncoding RNAs, accounting for 0.1–0.4% of As in RNAs.49 Throughout the mRNA life cycle, m6A participates in regulating multiple key processes, such as translation efficiency, nuclear export rate, splicing patterns, and the degradation kinetics, underscoring its central importance in maintaining cellular function.34 Given the pivotal role of m6A in RNA metabolism and physiological homeostasis, any dysregulation of m6A methylation levels may lead to a series of complex biological consequences. Therefore, studies on m6A have focused on its impact on transcriptome stability and its functional roles under diverse physiological and pathological conditions. In recent years, advancements in analysis technologies have enabled a more comprehensive understanding of the mechanisms by which m6A regulates gene expression networks. This progress not only deepens our insights into epitranscriptomics but also provides valuable implications for developing therapeutic strategies for related diseases.34
image file: d5cs00479a-f2.tif
Fig. 2 Representative RNA modifications. Red highlights represent modification groups on the bases.

m1A modification occurs at the N-1 position of RNA adenine (A) (Fig. 2), located at the Watson–Crick base pairing interface and imparting a positive charge to this base. Due to this positive charge, m1A is incapable of forming canonical hydrogen bonds with thymine (T) or uracil (U),50 which is markedly different from unmodified A. Although the abundance of m1A in RNA (0.01–0.05%) is lower than that of m6A, the positive charge it carries can profoundly influence protein–RNA interactions and RNA secondary structures.35 Specifically, m1A modification can regulate the binding specificity and efficiency of a specific protein complex by altering the physicochemical properties of a local RNA region.51,52

i6A is a highly specialized and extensively modified A, characterized by an isopentenyl side chain attached to the nitrogen atom of the amino group at the C-6 position of A (Fig. 2). This modification is predominantly localized at position 37 near the anticodon loop of bacterial and eukaryotic tRNA molecules.53 The presence of i6A is crucial for enhancing the recognition between codons and anticodons during translation, thereby promoting overall translation fidelity and efficiency.36 The absence of this critical modification results in a substantial decrease in the affinity between tRNA and its cognate codons, potentially resulting in the accumulation of misfolded proteins or other forms of cellular dysfunctions.53

Inosine (I) is a modified nucleoside derived from adenosine (A) (Fig. 2) through the deamination process mediated by adenosine deaminase acting on RNA (ADARs), also known as adenosine-to-inosine (A-to-I) editing. I preferentially pairs with cytosine (C), introducing an additional possibility of base pairing into RNA molecules, and thereby influencing RNA function and fate.37 Specifically, A-to-I editing events in coding regions may result in codon conversions and alterations in the amino acid sequence, which potentially affects protein function, structure or activity.54 Notably, A-to-I editing is not confined to coding regions but also occurs extensively in noncoding RNA regions, such as introns, UTRs, and long noncoding RNAs (lncRNAs).38

2.3 Cytosine modifications of RNA

Similar to DNA, in RNA, the carbon atom at position 5 of cytosine can also undergo methylation, a modification referred to as m5C (Fig. 2). Nevertheless, its abundance is considerably lower, accounting for approximately 0.02–0.09%.55,56 m5C exists in mRNAs, rRNAs, tRNAs and lncRNAs, particularly within the UTRs of mRNAs.28 It plays a critical role in the nuclear export of mRNAs.55 The hydroxylation of m5C by ALKBH1 to form hm5C is also involved in mRNA transport.29 Further oxidation of hm5C leads to the formation of f5C,29 a process analogous to the oxidation/demethylation of 5mC in DNA. f5C is initially discovered at position 34 of mammalian mitochondrial methionine transfer RNA and is later found in other coding and noncoding RNAs.30

m3C (Fig. 2) is prevalently present in the eukaryotic cytoplasmic and mitochondrial tRNA at position 32 within the anticodon loop.32 This modification plays a crucial role in maintaining the structural stability and functional integrity of tRNAs. A critical interaction exists between residues 32 and 38 on tRNA, which is essential for stabilizing the three-dimensional structure of the anticodon loop. Thus, by strengthening this interaction, m3C not only contributes to enhancing the intrinsic stability of tRNA but also profoundly influences mRNA translation efficiency and protein expression levels. In humans, apart from the C-32 position of the anticodon loop, some specific types of tRNAs also have m3C modifications in their variable loop regions.31 These modifications intricately regulate the function of tRNA, thereby affecting the quality and efficiency of the entire translation process.

ac4C is a modified base found in tRNA,57 rRNA and mRNA. Structurally, it resembles cytidine but features an acetyl group attached to the N-4 of the cytosine base. This modification is primarily located in the anticodon loop of tRNA, particularly at the wobble position adjacent to the anticodon. The presence of ac4C in tRNA is crucial for accurate genetic code decoding during translation.33 The acetyl group increases base-pairing specificity and stability, particularly at the wobble position, which is essential for maintaining the fidelity of protein synthesis.58

2.4 Uracil modifications of RNA

Ψ, an isomeric form of U, is formed by the interconversion of its C-5 and N-1 sites (Fig. 2), and it ranks among the most abundant RNA modifications currently known.43 This modification is ubiquitously distributed across almost all types of RNA, including rRNAs, tRNAs, and mRNAs. In mammalian cells, the Ψ to U ratio (Ψ/U) is approximately 0.2% to 0.6%.59 The PUS family enzymes are responsible for catalyzing the isomerization from U to Ψ.60 Notably, this process is particularly significant under environmental stress conditions such as heat shock, suggesting that Ψ may play a pivotal role in stress response mechanisms.43 Additionally, the presence of Ψ can markedly affect the stability of the RNA secondary structures, as Ψ has different hydrogen bonding modes and stacking properties compared to U, thereby altering the interaction forces within or between RNA molecules.43

D is a modification product formed by the reduction of U, and it is characterized by the formation of a saturated bond between the C-5 and C-6 positions (Fig. 2). Compared with U, this modification leads to increased reactivity at the C-4 position for nucleophilic attack. Additionally, D is a nonplanar molecule, which implies that it may interfere with the base stacking pattern in RNA molecules, thereby influencing normal base pairing and the overall three-dimensional structure of RNA.44 Notably, D is not only one of the most abundant modified nucleosides in eukaryotic tRNAs, but also exists in mRNAs. As a key modification in tRNAs, D is crucial for maintaining the proper folding and stability of tRNAs.45

2.5 Guanine modifications of RNA

m7G is an RNA modification that is widespread in eukaryotes (Fig. 2). It is frequently present in the 5′ cap structure of mRNAs and within tRNAs, accounting for approximately 0.002–0.05% of the total guanine content.61 This modification possesses diverse functions across different types of RNA. At the 5′ end of mRNA, m7G forms the so-called “cap” structure, specifically denoted as m7GpppN39 (here, N represents the first nucleotide immediately following). This m7G cap plays a pivotal role not only in ensuring mRNA stability but also in facilitating pre-mRNA splicing, nuclear export of transcripts, and enhancing translation initiation efficiency.62 By safeguarding mRNA from exonuclease-mediated degradation, the m7G cap ensures the persistence and availability of mRNA molecules in the cytoplasm.62 Internal m7G modification can enhance mRNA translation efficiency, possibly because their ability to alter local RNA conformation,63 thereby increasing ribosome accessibility or promoting the binding of other translation-related factors to mRNAs. Additionally, given its positive charge, m7G can regulate protein–RNA interactions through electrostatic effects, contributing to the remodeling of local RNA secondary structures.61 In tRNA, the m7G modification mainly influences tRNA folding and its ability to interact with codons.40

m2G and m22G are modifications that prevalently exist in tRNAs (Fig. 2). These modifications contribute to ensuring the proper folding of tRNA, stabilizes its three-dimensional structure, and enhancing the efficiency and accuracy of translation by strengthening the interaction between specific tRNA molecules and the protein synthesis machinery.41,42 While m2G and m22G modifications have long been regarded as primarily related to RNA structural regulation, recent studies have started to disclose their broader biological significance. For instance, some studies have indicated that in sperm-derived tRNA fragments (tsRNAs), the upregulation of m2G can serve as an epigenetic factor to mediate the intergenerational inheritance of metabolic diseases.64

3. Overview of chemical-assisted analysis of nucleic acid modifications

The analysis of nucleic acid modifications confronts two major challenges. First, the molecular structural differences between modified bases and their unmodified counterparts are exceedingly subtle. Most modifications do not disrupt canonical complementary base pairing. Second, many modifications occur at low abundance. When the majority of nucleic acid molecules consist of unmodified bases, the signals of modified bases are often obscured by the background signal. This imposes extremely high demands on the specificity and sensitivity of analytical methods. To fulfill the stringent requirements of nucleic acid modification analysis, strategies for the molecular recognition of base modifications and the subsequent signal readout are essential.

The molecular recognition strategies for base modifications include antibody-based methods, enzyme-assisted methods, metabolic labeling methods, and chemical-assisted methods.9,65–68 Antibody-based methods68 mainly rely on the highly specific binding between antibodies and specific base modifications. Metabolic labeling69 typically involves the introduction of chemically modified nucleotide analogs into living cells, where these analogs are incorporated into newly synthesized nucleic acids by the cellular enzymatic system. This enables subsequent detection and localization of the labeled bases through techniques such as click chemistry. In enzyme-assisted methods,67 natural or engineered enzymes are utilized to specifically label or convert modified bases. In chemical-assisted methods,15 modified bases are specifically converted into distinct structures by treatment with chemical reagents, allowing them to be distinguishable from unmodified bases. Chemical-assisted methods leverage the unique chemical reactivity of modified bases, resulting in high sensitivity and selectivity. Therefore, the focus below is on the detailed discussion of chemical-assisted methods.

After modified bases are identified and converted into an easily detectable structure through chemical-assisted methods, they can be detected using techniques such as LC–MS, fluorescence analysis, quantitative PCR (qPCR), gel electrophoresis, Sanger sequencing, next-generation sequencing (NGS) and nanopore sequencing. (1) LC–MS (Fig. 3(A)). In the analysis of base modifications, nucleic acids are typically degraded to the single nucleotide level, and specific base modifications are identified by detecting the mass differences of these single nucleotides via LC–MS. Chemical-assisted methods can enhance molecular weight differences between modified and unmodified bases, thereby increasing the sensitivity of their recognition. (2) Fluorescence analysis (Fig. 3(B)). A probe with a fluorescent group is chemically conjugated to the modified base, enabling the detection of various base modifications via a microplate reader or a fluorescence microscope. (3) qPCR (Fig. 3(C)). In their native state or after chemical conversion, certain types of base modifications can inhibit the activity of ligases and DNA polymerases. Based on this principle, nucleic acid probes targeting base modification sites can be designed to obtain distinct ligation and amplification products. Subsequently, qPCR can be employed to quantify the presence of different sequences, thereby indirectly evaluating the existence and relative abundance of modified bases. (4) Gel electrophoresis (Fig. 3(D)). After chemical treatment, nucleic acids containing certain modified bases are cleaved at the modification sites, or truncated reverse transcription products are generated. By comparing the positions of standard reference samples with those of experimental samples on the gel, the presence of a specific modification can be inferred. (5) Sanger sequencing (Fig. 3(E)). After chemical conversion, modified base sites are specifically converted into other types of bases or undergo random mutations during reverse transcription. Subsequently, Sanger sequencing signals can directly reveal the positions of the modified bases. (6) NGS (Fig. 3(E)). In contrast to Sanger sequencing, NGS offers high-throughput and cost-effective analysis capabilities. NGS is capable of sequencing millions of nucleic acid fragments simultaneously, and the resulting data are analyzed through bioinformatics tools to identify and locate modified bases. This technology enables quantitatively analysis of base modification sites at single-base resolution across both genome-wide and transcriptome levels, significantly accelerating the progress of epigenetic research. (7) Nanopore sequencing (Fig. 3(F)). This method directly acquires sequence information by detecting changes in electrical current as nucleic acid molecules pass through a nanoscale pore. Since chemical conversion of modified bases influences the speed and manner, in which nucleic acid molecules pass through the nanopore, information regarding modification sites and types can be extracted from the electrical signal. The real-time and long-read capabilities of nanopore sequencing make it an ideal option for studying complex base modification patterns.


image file: d5cs00479a-f3.tif
Fig. 3 The signal readout strategies and chemical-assisted recognition methods are synergistically employed to achieve the analysis of nucleic acid modifications. (A) LC–MS. (B) Fluorescence analysis. (C) qPCR. (D) Gel electrophoresis. (E) Sanger sequencing and NGS. (F) Nanopore sequencing.

4. Base conversion strategies

4.1 Deamination reactions

4.1.1 C-to-T conversion mediated by cytosine deamination. The amino group at the C-4 position of cytosine plays a critical role in hydrogen bond formation during Watson–Crick base pairing. Consequently, deamination of this group causes base mispairing (Fig. 4(A)) during nucleic acid polymerase-mediated amplification, which can be detected via sequencing. The deamination product of C is U, which pairs with A, leading to a detectable C-to-T conversion signal.
image file: d5cs00479a-f4.tif
Fig. 4 Bisulfite deamination sequencing for the detection of DNA cytosine modifications. (A) C pairs with G, and U pairs with A. (B) In the presence of bisulfite, C, 5fC, and 5caC are deaminated to yield U. 5mC remains unaffected, while 5hmC undergoes an addition reaction with bisulfite to form the adduct cytosine 5-methylenesulfonate (CMS). (C) Improved bisulfite sequencing is employed for the detection of 5mC, 5hmC, 5fC, and 5caC modifications. Reduced BS-Seq (redBS-Seq) is utilized to detect 5fC in assistance with BS-Seq. Oxidative bisulfite sequencing (oxBS-Seq) is utilized to detect 5hmC in assistance with BS-Seq. Chemically assisted bisulfite sequencing (fCAB-Seq) is utilized to detect 5fC in assistance with BS-Seq. Chemical modification-assisted bisulfite sequencing (CAB-Seq) is utilized to detect 5caC in assistance with BS-Seq. TET-assisted bisulfite sequencing (TAB-Seq) is employed to distinguish between 5hmC and 5mC. Red highlights indicate the functional groups participating in the subsequent reaction.

Bisulfite deamination. Bisulfite sequencing (BS-seq)70 is a classical approach for the quantitative analysis of DNA 5mC at single-base resolution. HSO3 functions as a nucleophile to undergo nucleophilic addition at the C-5 and C-6 positions of an unmodified C, generating a 5,6-dihydrocytosine-6-sulfonic acid intermediate. Subsequently, a deamination reaction occurs, leading to the formation of 5,6-dihydrouradine-6-sulfonic acid. Next, the double bond between C-5 and C-6 is restored through a β-elimination reaction, eventually resulting in a U structure (Fig. 4(B)). During subsequent DNA replication, U pairs with A, achieving C-to-T signal conversion. Due to the electron-donating property of the methyl group at the C-5 position, the 5mC modification enhances the electron density at the C-5-to-C-6 double bond, making it less susceptible to nucleophilic attack;71 thus, it does not undergo a nucleophilic addition reaction with HSO3. Under bisulfite treatment, 5mC remains unconverted, pairs with G during subsequent amplification, and is still read as a C signal during sequencing. Consequently, the discrimination of 5mC and C at single-base resolution is achieved.14,70,72

However, recent studies have indicated that 5hmC does not undergo a deamination reaction.73 Instead, 5hmC reacts with bisulfite to form cytosine 5-methylenesulfonate (CMS), which is subsequently read as a C signal during the sequencing process (Fig. 4(B)). In contrast, 5fC and 5caC undergo deamination reactions upon bisulfite treatment (Fig. 4(B)), resulting in T signals during sequencing. For 5fC, the possible mechanism involves the formation of a double-addition intermediate between 5fC and bisulfite, which is subsequently restored to C, and then deaminated under the action of bisulfite. Similarly, 5caC is decarboxylated and transformed to C before deamination. In conclusion, under bisulfite treatment, unmodified C, 5fC, and 5caC exhibit C-to-T signal conversion, while 5mC and 5hmC remain unaffected and do not undergo signal conversion. Therefore, direct bisulfite sequencing of genomic DNA cannot distinguish the signals of 5mC from 5hmC.

The improved bisulfite sequencing method74–79 enables the differentiation of 5mC and 5hmC signals and facilitates the simultaneous detection of 5fC and 5caC (Fig. 4(C)). The first strategy is based on the differences in signal readout resulting from the conversion among different modifications. For instance, oxBS-seq74,75 employs KRuO4 to selectively oxidize 5hmC to 5fC; subsequently, after bisulfite treatment, the 5hmC locus is read as a T signal. The results are compared with canonical BS-seq, and loci with distinct signals are identified as 5hmC sites. Based on similar principles, redBS-seq76 utilizes NaBH4 to reduce 5fC to 5hmC, followed by BS-seq. The 5fC locus is manifested as a C signal. A comparison of the results indicates that loci with differing signals correspond to 5fC sites. Besides interconversion strategies between modifications, protection strategies for different modifications are also widely employed. For instance, in CAB-seq,77 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide (EDC) is employed to catalyze the condensation reaction of the carboxyl group of 5caC with primary amines, thereby protecting 5caC from deamination reaction during bisulfite treatment. Consequently, the 5caC locus is maintained as a C signal during sequencing. The sequencing results can be compared with those of canonical BS-seq to obtain 5caC loci. In fCAB-seq,78 hydroxylamine is utilized to protect 5fC from deamination during bisulfite treatment, and the 5fC locus is retained as a C signal in BS-seq. The sequencing results are compared with canonical BS-seq data to acquire information on 5fC loci. In TAB-seq,79 β-glucosyltransferase-catalyzed glycosylation is employed to protect 5hmC, followed by oxidation of 5mC to 5caC via the TET enzyme. In subsequent BS-seq, the 5mC locus is read as a T signal, while the 5hmC locus is retained as a C signal, enabling the distinction between 5mC and 5hmC loci.

Whole-genome bisulfite sequencing (WGBS) has been extensively employed to profile DNA methylations.80–82 Nevertheless, the stringent bisulfite treatment conditions cause significant DNA degradation, thereby limiting its application in low-input samples.83 Additionally, the presence of high-GC regions or highly structured regions often results in incomplete deamination,84 leading to false-positive signals. Beyond DNA, bisulfite sequencing can also be utilized for the identification of m5C modifications in RNA.72 Compared with DNA 5mC modification, the abundance of m5C modifications in RNA is much lower, and RNA is more susceptible to degradation, thereby making the detection of RNA m5C more challenging. Previous studies adopted milder BS-seq conditions,72,85 such as lowering the reaction temperature and extending the reaction time, to reduce RNA degradation. However, these milder conditions often lead to incomplete deamination and many false-positive signals.84 Therefore, achieving an optimal balance between deamination efficiency and degradation rate has consistently been a challenge for BS-seq.

In BS-seq, two pathways exist:83 U-BS adduct can be transformed into U, or spontaneous depyrimidination may lead to DNA/RNA degradation. A high concentration of the bisulfite reagent can accelerate the conversion rate, thereby allowing the BS reaction to complete within a brief time and reducing DNA degradation. BS-seq is typically conducted using 3 to 5 M sodium bisulfite.39 Nevertheless, the solubility of sodium bisulfite in water is limited.86 To increase the bisulfite concentration and enhance the deamination efficiency, some researchers employed a mixture of sodium bisulfite, ammonium bisulfite, and sulfite (∼10 M).39 However, this mixture requires a relatively high temperature for dissolution and tends to precipitation during cooling, resulting in overly viscous solutions that are difficult to handle. In the recently developed UBS-seq method,84 researchers utilized high concentrations of mixed ammonium bisulfite and sulfite reagents along with elevated reaction temperatures to accelerate the deamination and significantly shorten the reaction time, thereby minimizing DNA and RNA degradation. Additionally, higher temperatures facilitate the disruption of DNA and RNA secondary structures, reducing background signals. The optimized conditions of UBS-seq effectively minimize false-positive signals and enable rapid and accurate analysis of 5mC/m5C modification in trace amounts of DNA/RNA samples.

However, a fundamental limitation of BS-seq is that nearly 95% of Cs in the mammalian genome are unmethylated and are converted to T signals during bisulfite treatment, thereby reducing the complexity of DNA sequences. This can result in incompleteness and bias in the data.87 Therefore, a variety of bisulfite-free methods, such as pyridine-borane-mediated reductive deamination, have recently been developed.


Pyridine-borane-mediated reductive deamination. Due to the modification of the electron-withdrawing carboxyl and aldehyde groups on 5caC and 5fC, the electron density on their pyrimidine ring is lower than that of unmodified C;88 consequently, they are more susceptible to reduction. 5caC and 5fC can undergo reductive deamination by pyridine borane treatment to generate D, a process that does not occur on C, 5mC or 5hmC. D pairs with A during subsequent replication (Fig. 5(A)), leading to C-to-T conversion with an efficiency of approximately 98%88 (Fig. 5(B)). Based on this principle and the protecting group strategy, researchers have developed a series of pyridine-borane-based sequencing methods for the detection of 5mC, 5hmC, 5fC, and 5caC.88 After protecting 5fC with hydroxylamine or protecting 5caC with EDC, pyridine-borane-mediated reduction and sequencing can be performed to quantify 5caC and 5fC at single-base resolution (Fig. 5(C)). Additionally, by further combining the interconversion of different modifications, the analysis of 5mC and 5hmC can also be achieved (Fig. 5(C)). In the TAPS method,88 5mC and 5hmC can be oxidized to 5caC by TET oxidase and then reduced by pyridine borane to complete the C-to-T conversion. In the TAPSβ method,89 5hmC is protected from oxidation and pyridine-borane-mediated reduction by glycosylation. Thus, only 5hmC remains as a C signal after TET oxidase treatment and pyridine-borane-mediated reduction, leading to targeted 5hmC detection compared with TAPS. In the CAPS method,89 KuRO4 is employed to oxidize 5hmC to 5fC. After pyridine-borane-mediated reduction, only 5mC persists as a C signal, achieving 5mC detection compared with TAPS.
image file: d5cs00479a-f5.tif
Fig. 5 Pyridine-borane-mediated deamination and sequencing for the detection of DNA cytosine modifications. (A) C pairs with G, and D pairs with A. (B) 5fC and 5caC are deaminated to generate U through reduction with pyridine borane, while C, 5mC, and 5hmC remain unaffected. (C) Pyridine borane deamination and sequencing is employed to detect 5mC, 5hmC, 5fC, and 5caC modifications. The utilization of hydroxylamine to protect 5fC followed by pyridine borane treatment leads to a distinct readout signal at the 5fC site compared to normal pyridine borane sequencing. The application of EDC-catalyzed protection of 5caC followed by pyridine borane treatment resulted in a different readout signal at the 5caC site compared to normal pyridine borane sequencing. All cytosine modifications in TET-assisted pyridine borane sequencing (TAPS) are read as T. The comparison of the CAPS and TAPS results provided information on the 5mC modification. The comparison of TAPSβ with TAPS results enabled the acquisition of 5hmC modification information. Red highlights indicate the functional groups participating in the subsequent reaction.

Compared with bisulfite treatment, pyridine-borane-mediated reduction does not induce signal conversion in unmodified C and retains the complexity of DNA sequences, thereby facilitating the generation of higher-quality DNA methylation maps.88 Moreover, the conditions of pyridine-borane-mediated reduction are sufficiently mild to cause minimal damage to nucleic acid fragments.88 In this manner, long-read methylation mapping methods have the potential to be developed using nanopore sequencing technology, due to the retention of DNA fragments longer than 10 kb. Recently, pyridine-borane-mediated reduction and deamination sequencing (Fig. 5(B)) have also been applied to identify f5C modifications (f5C-seq),90 realizing the profiling of transcriptomic f5C at single-base resolution.


Other deamination strategies. Based on a principle similar to that of pyridine borane, f5C can be reduced and deaminated by NaCNBH3 to dihydro-5-hydroxymethyluridine under strongly acidic conditions, and ca5C can be reductively deaminated to dihydrouridine (Fig. 6(A)), thereby triggering C-to-T signal conversion; C, m5C, and hm5C remain unaffected during this process.91 In this manner, f5C and ca5C can be detected through borohydride-based conversion, but the conversion efficiency is not sufficiently high. Therefore, the borohydride-based method for the identification of cytosine modifications in RNA still needs further development.
image file: d5cs00479a-f6.tif
Fig. 6 Other deamination strategies for the detection of cytosine modifications. (A) Reduction and deamination of ca5C and f5C. (B) Photocatalytic conversion of 5caC to D under the catalysis of [Ir(dF(CF3)ppy)2(dtbpy)]Cl. Red highlights indicate the functional groups participating in the subsequent reaction.

Owing to the electron-withdrawing effect of the carboxyl group, the electron density on the pyrimidine ring of DNA 5caC is lower than that of unmodified C, making more susceptible to reduction compared to unmodified C. Under the catalysis of the photocatalyst [Ir(dF(CF3)ppy)2(dtbpy)]Cl, light-induced single electron transfer and decarboxylation occur, followed by hydrolysis to generate D (Fig. 6(B)), thereby realizing C-to-U conversion.92 This method can almost completely convert 5caC to D with high selectivity and has minimal effects on other unmodified bases, as well as 5mC, 5hmC, and 5fC. Since this reaction occurs only in single-stranded DNA (ssDNA), the detection of 5mC in genomic DNA requires prior high-temperature denaturation treatment to transform double-stranded DNA (dsDNA) into ssDNA.

4.1.2 A-to-G conversion mediated by adenine deamination. The amino group at the C-6 position of A is an essential functional group for hydrogen bond formation in Watson–Crick pairing. Consequently, the product resulting from the deamination reaction of this group pairs with C (Fig. 7(A)), leading to A-to-G signal conversion.
image file: d5cs00479a-f7.tif
Fig. 7 Nitrite deamination sequencing for the detection of RNA m6A modifications. (A) A pairs with T, and I pairs with C. (B) A is deaminated under sodium nitrite treatment to generate I, while m6A remains unaffected by deamination. Besides A, C and G are also deaminated under sodium nitrite treatment, generating U and X, respectively. (C) The use of glyoxal protects G from deamination under sodium nitrite treatment and facilitates the deamination of A. Red highlights indicate the functional groups participating in the subsequent reaction.

Sodium nitrite deamination. Under acidic conditions, unmodified A in RNA reacts with nitrite to form a diazonium salt intermediate, which is subsequently eliminated to give hypoxanthine, and this is read as G during sequencing (Fig. 7(B)). On the contrary, m6A cannot undergo this deamination reaction due to the presence of the N-6 methyl group; thus, it is still read as A during sequencing.93,94 However, both G and C also undergo deamination under nitrite treatment (Fig. 7(B)), resulting in the conversion of G to xanthosine (X) and C to U. Moreover, the nitrite-induced deamination of G occurs more rapidly than that of A; therefore, this version of the nitrite-based method is not ideal for detecting RNA m6A modifications. Recently, it has been reported that glyoxal can reversibly protect G from deamination (Fig. 7(C)), while the addition of glyoxal significantly enhances the deamination of A.95 The possible mechanism is that glyoxal reacts with A to form semi-amine, which serves as an efficient catalyst for A deamination. The nitrite concentration (750 mM NaNO2), reaction temperature and time (16 °C for 8 h) have been further optimized, leading to a higher A-to-I conversion rate (99%) and lower G-to-X (3%) and C-to-U (4%) conversion rates.95 Therefore, this optimized nitrite-based strategy enables quantitative analysis of transcriptomic m6A at single-base resolution.

4.2 Condensation reactions

4.2.1 DNA 5fC and RNA f5C. The presence of o-aminobenzaldehyde in 5fC makes it a potential substrate for the Friedländer reaction. After screening numerous substrates, 1,3-indanedione was selected to undergo cyclization with 5fC, which is read as T during sequencing (Fig. 8(A)), leading to C-to-T signal conversion.96 This may be attributed to the fact that its original N-4 position is no longer a proton donor, thereby breaking the hydrogen bond between C and G. In the fC-CET method,96 5fC is labeled with azide-conjugated 1,3-indanedione, and then coupled with DBCO-S-S-Biotin via click chemistry. Subsequently, the fragments containing 5fC are affinity-enriched, followed by detection of 5fC loci using NGS. Recently, TET-assisted m5C-TAC-seq has also been employed for the detection of RNA m5C,97 in which m5C is oxidized to f5C via the TET oxidase, followed by labeling with a 1,3-indanedione derivative and sequencing. However, since the oxidation of m5C by the TET enzyme also generates hm5C and ca5C, the oxidation efficiency of f5C is only approximately 50%; therefore, only semi-quantitative analysis of m5C can be achieved, and this method is not suitable for low-input RNA samples.
image file: d5cs00479a-f8.tif
Fig. 8 The condensation reactions lead to 5fC-to-T base conversion. 5fC undergoes a condensation reaction with (A) 1,3-indanedione or (B) malononitrile to produce a cyclisation product that is read as a T signal during sequencing. Red highlights indicate the functional groups participating in the subsequent reaction.

In addition, due to the low water solubility of 1,3-indanedione and its derivatives, although fC-CET is highly selective and robust in bulk samples, it is not applicable to low-input samples, such as single-cell samples. Moreover, before the subsequent sequencing library preparation, multiple purification steps are necessary to remove 1,3-indanedione, which may lead to a certain degree of sample loss. Therefore, a reagent with high water solubility that does not interfere with the subsequent library preparation process is necessary. Malononitrile fulfills this requirement and can undergo a condensation reaction with 5fC to yield a cyclized product through a mechanism similar to that of 1,3-indanedione98 (Fig. 8(B)). The N-4 position of the cyclization product lacks a hydrogen atom, which disrupts the hydrogen bond between C and G, thereby causing the C-to-T signal conversion. Malononitrile exhibits high solubility (approximately 1 M) under various pH conditions, and the labeling reaction does not lead to significant degradation. More significantly, malononitrile does not influence the activity of several commonly used DNA polymerases, even at high concentrations; thus, additional purification steps are not necessary. Based on this principle, CLEVER-seq for DNA 5fC sequencing has been developed,98 enabling genomic 5fC analysis at single-base resolution and single-cell level. An RNA f5C sequencing method based on malononitrile-mediated conversion, termed Mal-seq,99 has also been established; however, its conversion efficiency (only approximately 50%) is lower than that of DNA 5fC, making it challenging to detect low-stoichiometric f5C modifications.

4.3 Addition and substitution reactions

4.3.1 RNA Ψ. N-Cyclohexyl-N′-(2-morpholinoethyl)-carbodiimide metho-p-toluenesulfonate (CMC) can be added to the N-1 and N-3 positions of Ψ (Fig. 9(A)), the N-3 position of U, and the N-1 position of G.100 Subsequently, all adducts except the N-3 position of Ψ are removed under basic conditions (pH = 10.3), leading to the selective labeling of Ψ. The addition of the bulky group results in significant steric hindrance, thereby causing the termination of reverse transcription. The truncated cDNA products can be identified by NGS or other methods. Technologies such as Ψ-seq,101 Pseudo-seq,102 and PSI-seq100 are based on the selective reaction of CMC with Ψ and utilize NGS for transcriptome-wide analysis of Ψ at single-base resolution. In the CeU-seq method,103 Ψ is selectively labeled with azide-coupled CMC, and then coupled to a DBCO-Biotin probe via click chemistry. Subsequently, the Ψ locus is detected through affinity enrichment followed by NGS. This method enhances the sensitivity for identifying low-stoichiometric Ψ modifications. Besides NGS, the principle of CMC-mediated selective labeling can also be combined with qPCR,104 gel electrophoresis,105 and other technologies106 for analyzing specific Ψ loci at lower costs. However, CMC-based methods have relatively low labeling efficiency and selectivity for Ψ, and the lack of stoichiometric information complicates the distinction between the true Ψ signal and background noise.
image file: d5cs00479a-f9.tif
Fig. 9 Addition or substitution reactions for base modification detection. (A) The addition of CMC to the N-3 site of Ψ forms an addition product (Ψ-CMC) that is more stable under weakly basic conditions, resulting in the termination of the reverse transcription at this site. (B) The reaction of Ψ with bisulfite generates a Ψ-bisulfite adduct (Ψ-BS), causing one base deletion at this site during reverse transcription. (C) The substitution reaction of Ψ with 2-bromoacrylamide generates a cyclisation product (nce1,2-Ψ) that is read as a C signal during sequencing. (D) I undergoes an addition reaction with acryl cyanide and acrylamide, causing the termination of reverse transcription. (E) i6A undergoes an addition reaction with I2 followed by an intramolecular substitution to generate a cyclization product, leading to stochastic mutations in the sequencing signal. (F) Allyl groups are ligated to i6A in the presence of MjDim1 enzyme and allyl-SAM, subsequently undergoing an addition reaction with I2 followed by an intramolecular substitution to generate a cyclisation product, causing stochastic mutations in the sequencing signal. Red highlights indicate the functional groups participating in the subsequent reaction.

Ψ can react with bisulfite to cause the ring opening of deoxyribose (Fig. 9(B)), subsequently generating a special Ψ-bisulfite adduct.107 During reverse transcription, the cDNA product has a deletion of 1 or 2 nucleotides near the adduct locus. Based on this principle, RBS-seq enables single-base resolution identification of Ψ across the transcriptome.85 However, under canonical bisulfite treatment, the reaction efficiency is low, resulting in a low base deletion rate at the Ψ locus and leading to a significant underestimation of Ψ modifications. Additionally, all unmodified Cs are converted to Us under bisulfite treatment, which reduces sequence complexity and cause a low mapping rate as well as inaccurate Ψ identification.85 To enhance the conversion rate of Ψ, the concentration of the nucleophile in the reaction system needs to be increased. Early studies have demonstrated that in high-concentration sodium bisulfite solutions, the main components are bisulfite ion dimers and pyrosulfite ions (S2O52−),108 which exhibit no nucleophilic activity toward Ψ, whereas the hyposulfite ion is an effective nucleophile. Therefore, by increasing the proportion of sulfite and reducing that of bisulfite, the concentration of effective nucleophiles rises; thereby increasing the base deletion rate at the Ψ locus.109 The deamination of C tends to occur under acidic conditions,109 while the addition reaction of Ψ can proceed under basic conditions.109 Therefore, further enhancing the ratio of sulfite can significantly inhibit C-to-U conversion without influencing Ψ conversion. With the optimized BS-seq conditions, the PRAISE109 (2 M K2SO3/0.36 M NaHSO3) and BID-seq110 (2.4 M Na2SO3 and 0.36 M NaHSO3) methods achieved quantitative analysis of transcriptomic Ψ modifications with enhanced sensitivity. In the pseU-TRACE111 method, two DNA probes are designed to be complementary to both sides of the Ψ locus in the cDNA sequence. In the absence of bisulfite treatment, the cDNA product remains intact, and a one-base gap exists between the two probes, which prevents efficient ligation of the two DNA probes, resulting in few ligation products. For bisulfite-treated RNA, the adduct at the Ψ locus causes the base deletion in the cDNA product at the corresponding locus, enabling the two DNA probes to come into proximity and facilitating their efficient ligation to generate a long DNA strand. The ligated long DNA strand can then be successfully amplified through subsequent qPCR to distinguish between modified and unmodified loci. According to the BIHIND method,112 two DNA probes are designed directly on both sides of the RNA Ψ locus with a one-base gap. After bisulfite treatment, under the action of Bst DNA polymerase and SplintR ligase, DNA probes at U locus are effectively ligated, whereas ligation at the Ψ locus is inhibited. Finally, the signals are differentiated via qPCR.

However, it is challenging to detect consecutive Ψ sites or densely modified Ψ sites. To address this issue, the BACS method based on Ψ-to-C conversion has been developed.113 The most notable difference between Ψ and U lies in the free N-1 position of Ψ, which is highly reactive toward Michael addition acceptors such as acrylonitrile, acrylamide, and other acrylic compounds.114,115 The selective labeling of Ψ by acrylonitrile has been widely utilized to distinguish Ψ from U via LC–MS.116 Nevertheless, the simple N-1 adduct formed between Ψ and the acrylic compound does not induce mutations during reverse transcription.113 A Michael addition acceptor with an α-halogen group can trigger a ring formation reaction with Ψ. For example, Ψ can form a ring with 2-bromoacrylamide113 (Fig. 9(C)), and the ring-formed product pairs with G during the subsequent reverse transcription process, achieving a signal conversion of Ψ to C with a conversion rate of 86.6%,113 while the conversion rate of unmodified U is less than 1%. Based on this principle, the BACS method enables higher resolution and more accurate quantitative analysis of Ψ modifications.

4.3.2 RNA I. I can directly pair with C and be read as G.38 Thus, this abundant modification can be detected through direct sequencing technology by comparing the cDNA sequence with the corresponding genomic DNA sequence.38 However, the direct sequencing method is unable to differentiate naturally occurring A-to-G mutations, and it is difficult to exclude false signals introduced during amplification and sequencing processes.38 Therefore, additional chemical-assisted conversion is necessary to confirm the presence of I. Under weakly basic conditions, the N-1 position of I is prone to deprotonation,117 enabling the Michael addition of acrylocyanide to generate N-1 cyanoethyl inosine (ce1I) (Fig. 9(D)). ce1I occupies the hydrogen bond pairing site and leads to the termination of reverse transcription at this adduct, while unmodified A does not undergo this reaction. Although Ψ can also undergo an addition reaction with acrylocyanide, this adduct does not induce mutagenesis during reverse transcription and thus does not affect the detection of I sites. Based on this principle, the ICE method38 in combination with Sanger sequencing has been employed to detect I loci by comparing G/A signals before and after acrylocyanide treatment. Before acrylocyanide treatment, the I locus is read as a G signal, and the A locus is read as an A signal, resulting in a mixture of G/A signals; after acrylocyanide treatment, reverse transcription terminates at the I locus, and the G signal disappears, leaving only the A signal remaining; thus, the I modification sites are identified. However, for I loci with a 100% modification ratio, the corresponding signal cannot be identified using this method. In addition, the application of specific loci restricts the discovery of new I modification loci. By combining the ICE principle with NGS, ICE-seq37,118 enables the identification of I sites across the transcriptome. Nevertheless, this method cannot enrich for I-containing transcripts, which limits its sensitivity. To overcome this issue, researchers have labeled the I locus with fluorescein-conjugated acrylamide and subsequently enriched the labeled RNA with a fluorescein antibody (Fig. 9(D)) and sequenced the corresponding locus.115 Furthermore, nano-ICE-seq119 leverages distinct signals in nanopore direct RNA sequencing (dRNA-Seq) generated before and after acrylocyanide addition, achieving the identification of I sites.

Ψ can also be labelled by acrylamide. Deprotonation at the N-1 position is a prerequisite for the addition reaction; therefore, the labeling rates of I and Ψ are pH-dependent.115 Since the pKa value at the N-1 position of I (pKa = 8.7) is slightly lower than that of Ψ (pKa = 9.5),120 the labelling efficiency of I is higher at pH 8.5–8.6, while the interference of Ψ is minimal. In fact, Ψ is predominantly found in rRNA and tRNA. The ratio of I to Ψ in mRNA is approximately 500[thin space (1/6-em)]:[thin space (1/6-em)]1; therefore, Ψ does not significantly interfere with the identification of I sites in mRNA.

4.3.3 RNA i6A and m6A. I2 can selectively undergo electrophilic addition reactions with the carbon–carbon double bond in the isoprene group of i6A121 (Fig. 9(E)). Subsequently, two cyclization products are generated through intramolecular nucleophilic substitution, transforming i6A from a hydrogen bond donor to a hydrogen bond acceptor. These cyclization products are unable to form Watson–Crick base pairing and instead generate T/C/G mutation signals, thereby indicating the modification sites of i6A. IMCRT tRNA-seq121 exploits this principle to achieve semi-quantitative analysis of i6A at single-base resolution.

m6A-SAC-seq122 utilizes chemically modified allyl-SAM as a cofactor and adds allyl groups containing carbon–carbon double bonds to m6A under the catalysis of the MjDim1 enzyme (Fig. 9(F)). Subsequently, two types of cyclization products are generated upon I2 treatment, which induces mutations of the sequencing signal at the m6A locus, thus enabling the identification of m6A modifications.

4.4 Reduction reactions

4.4.1 Reduction of groups on aromatic rings.
DNA 5fC. The aldehyde group on 5fC is susceptible to reduction, and NaBH4 is capable of reducing this group to a hydroxyl group,78 thereby achieving the specific conversion of 5fC to 5hmC (Fig. 10(A)). Based on the difference in deamination behavior between 5fC and 5hmC under bisulfite treatment, distinct modification sites can be further distinguished through subsequent sequencing, as demonstrated by the redBS-seq method.76
image file: d5cs00479a-f10.tif
Fig. 10 Reduction reactions for base modification detection. (A) 5fC is selectively reduced to 5hmC by NaBH4. (B) ac4C in RNA is reduced to tetrahydro-N4-acetylcytidine by NaCNBH3, which is read as a T signal during sequencing. Under basic treatment, ac4C undergoes deacetylation to produce C, which is not reduced by NaCNBH3 and is read as a C signal during sequencing. (C) m1A in RNA is reduced by NaBH4, and the reduction product has an increased read-through rate and is read as a T signal during sequencing. m1A undergoes Dimroth rearrangement to generate m6A, which also has an increased read-through rate and is read as an A signal during sequencing. (D) m7G in RNA is reduced by NaBH4 to generate an abasic site. Subsequent detection can be performed using aniline-induced cleavage, aldehyde probe capture and enrichment, or mismatches at the abasic site during reverse transcription. (E) D is reduced to THU by NaBH4, resulting in the termination of reverse transcription or a mismatch signal of C. Red highlights indicate the functional groups participating in the subsequent reaction.
4.4.2 Reduction of pyrimidine rings.
RNA ac4C. Due to the presence of the acetyl group on ac4C, the electron density on the pyrimidine ring decreases, thereby facilitating its reduction compared to unmodified C.123 Treatment of ac4C with NaBH4 can reduce it to tetrahydro-N4-acetylcytidine, while unmodified C is not affected by NaBH4. During reverse transcription, adenine nucleotides tend to be incorporated at the para-position of tetrahydro-N4-acetylcytidine, leading to C-to-T signal conversion.123 On this basis, the ac4C and C loci can be differentiated. However, the signal conversion rate at the ac4C site is low (approximately 50%) because of the limited reduction efficiency of NaBH4.

It is reported that ac4C can be protonated under acidic conditions,124 and the protonated form of ac4C is more susceptible to reduction by borohydrides.124 When treated with NaCNBH3 under acidic conditions (Fig. 10(B)), ac4C can be reduced more rapidly and completely than NaBH4, attaining a reduction efficiency of up to 90%.124 Moreover, there is an excellent absolute consistency between the mismatch rate and the modification ratio of ac4C. Under borohydride treatment, other modifications such as m1A, m7G, and D can also undergo reduction reactions,125–127 leading to base mismatches, including the generation of new T signals, thereby interfering with the detection of ac4C. To address this issue, the property that ac4C deacetylates to form C under alkaline treatment can be exploited. By comparing the detection results before and after alkaline treatment, the ac4C site can be differentiated from other modification sites (Fig. 10(B)). Based on these principles, ac4C-seq124,128 has been developed to profile transcriptomic ac4C maps at single-base resolution. With sufficient sequencing depth, ac4C-seq can detect and quantify modifications with high accuracy and precision.


RNA m1A. Under physiological conditions, m1A carries a positive charge, and the electron density on its pyrimidine ring is lower than that of unmodified A;129 thus, it is more susceptible to reduction reactions. Meanwhile, as the methylation modification at the N-1 position in m1A disrupts the hydrogen bond site required for the Watson–Crick base pairing, the m1A modification can lead to either mismatch or termination during reverse transcription,130 resulting in a significantly lower readthrough rate compared to unmodified A. For a small proportion of full-length reverse transcription products, a certain degree of A-to-U mutation occurs at the corresponding m1A modification site.126

Red-m1A-seq126 employs NaBH4 to reduce the double bond between N-1 and C-6 of m1A (Fig. 10(C)), thereby enhancing the A-to-U mutation rate and the read-through rate. Additionally, m1A can undergo Dimroth rearrangement to generate m6A under alkaline treatment.129 This specific property is utilized to distinguish m1A from other base modifications that exhibit increased mutation rates under NaBH4 reduction conditions, such as ac4C, m7G, and D. RNA is prone to degradation under alkaline conditions; thus, a balance must be struck between the efficiency of Dimroth rearrangement and the rate of RNA degradation. To this end, after NaBH4 reduction, the alkaline treatment condition involves initially incubating RNA at 95 °C in 0.15 M Tris buffer (pH 8.8) to facilitate Dimroth rearrangement while avoiding extensive RNA degradation; then 0.1 M NaHCO3 buffer (pH 9.2) is added, and the mixture is incubated at 95 °C for 15 min to promote RNA degradation into suitable fragment sizes (40–60 bp) for library preparation.126 Because m6A is read as A during sequencing, the Dimroth rearrangement product can serve as a control to detect m1A with greater sensitivity and accuracy. Nevertheless, despite the high efficiency of NaBH4 in reducing m1A, the actual read-through rate during reverse transcription remains below 60%.126 Even when employing reverse transcriptases with higher efficiency, the A-to-U mutation rate in the resulting cDNA is still less than 60% and exhibits significant fluctuations due to the influence of sequences surrounding the m1A sites. This limitation restricts its application in the quantitative detection of m1A.


RNA m7G. m7G carries a positive charge and has a lower electron density on its purine ring compared to unmodified G,131 making it more susceptible to reduction. Utilizing this property, it is feasible to distinguish m7G from G. When treated with NaBH4, the double bond between N-7 and C-8 of m7G is reduced, and its reduction product is prone to further depurination to form an abasic site131 (Fig. 10(D)). Under aniline treatment, the abasic sites in RNA are cleaved via β-elimination, generating RNA fragments containing 5′ phosphate groups at the N+1 site of m7G.132 In contrast, unmodified G does not react with NaBH4 and does not initiate cleavage after these treatments. Based on this principle, TRAC-seq125,133 employs AlkB enzyme to treat RNA, thereby demethylating modifications like m1A and m3C, while leaving m7G unaffected. This can enhance the efficiency of reverse transcription and reduce the interference of other modified bases. Subsequently, after NaBH4 treatment and aniline treatment, the m7G modification sites can be identified through NGS. Pre-enrichment of RNA fragments containing m7G by an anti-m7G antibody can further enhance the detection sensitivity and accuracy.125

However, this cleavage-based approach is not applicable to very short RNAs, such as miRNAs, since overly short cleaved RNA fragments cannot be reliably mapped to the transcriptome. To solve this problem, the BoRed-seq method134 employs a biotin-conjugated aldehyde reaction probe (ARP), N-(aminooxyacetyl)-N′-(D-biotinoyl)hydrazine, to couple with the abasic site generated after NaBH4 treatment (Fig. 10(D)). Subsequently, the coupling product is enriched using streptavidin magnetic beads for NGS. Through anti-m7G antibody enrichment and MS identification, the sequencing results can be cross-validated, enabling the accurate identification of multiple m7G modification sites in miRNA.

Abasic sites cause stochastic mismatches during reverse transcription. By leveraging this principle, m7G-seq135 and m7G-map-seq136 enable the detection of m7G modifications at single-base resolution. m7G-map-seq compares the cDNA library with and without NaBH4 reduction, identifying m7G modification sites based on the increased mutation rate in the former compared to the latter. Based on m7G-map-seq, m7G-seq additionally employs biotin-coupled hydrazine to capture and enrich the abasic sites converted from m7Gs, which enhances the detection of m7G sites with low modification ratios. Notably, when RNA is treated with NaBH4, it also increases the mutation rate at ac4C, m1A, and D sites, potentially interfering with the identification of m7G sites. Therefore, it is also necessary to further verify m7G modification sites using anti-m7G antibody enrichment methods. However, due to the incomplete reduction and depurination of m7G by NaBH4, only a fraction of m7G is converted to abasic sites, and only a fraction of these converted abasic sites can be conjugated with biotin probes, resulting in a final mutation rate of only 20–60%.135 Consequently, these methods are not suitable for quantitative analysis of m7G.

Recently, m7G-quant-seq137 has enhanced the reduction and depurination reaction conditions (Fig. 10(D)). Treating with a high concentration (∼800 mM) of KBH4 at room temperature for 4 h can fully convert m7G to its reduction product, which is more efficient than the treatment with NaBH4. The depurination of the reduced product is further enhanced by reacting it in a pH = 2.9 buffer solution (100 mM NaOAc/AcOH buffer) at 45 °C for 4 h. The optimized two-step conditions can almost completely convert m7G sites into abasic sites. Moreover, since G-to-T mutation is the predominant type of mutation introduced by the selected reverse transcriptase, increasing the ratio of dATP/dNTP during reverse transcription can significantly raise both the G-to-T mutation rate and the total mutation rate. Thus, m7G-quant-seq enables the quantitative analysis of transcriptomic m7G at single-base resolution.


RNA D. Based on the termination of reverse transcription caused by the hydrolysis of D under strong alkaline conditions,138 the D modification in RNA can be identified. Since severe RNA degradation occurs under such conditions, relatively mild NaBH4 reduction strategies have been developed to achieve D conversion (Fig. 10(E)). When NaBH4 and D react at 0 °C for a short time with a stoichiometric ratio of 1[thin space (1/6-em)]:[thin space (1/6-em)]1, the reduction products are tetrahydrocytidines (THU).139 However, under relatively stronger reduction conditions, such as a 2-hour reaction at room temperature where the stoichiometric ratio of NaBH4 and D is 2[thin space (1/6-em)]:[thin space (1/6-em)]1, the ring-opening product N-(β-D-ribofuranosy1)-N-(γ-hydroxypropy1) urea is obtained.140 The reaction does not occur at U site. At acidic conditions (pH = 3), the primary amine derivative can undergo a nucleophilic substitution reaction with THU to replace the 4-OH group and form a conjugate, thereby achieving fluorescent labeling of the D site. Using this principle, Rho-seq141 reduces D to THU and introduces a bulky rhodaminyl group to induce the termination of reverse transcription, and then identifies the D modification site by NGS. Similar to Rho-seq, D-seq142 directly uses THU-induced reverse transcription termination combined with NGS to identify D modifications without coupling bulky groups.

However, methods that rely on reverse transcription termination are not applicable for the identification of D modifications in highly modified RNAs (e.g., tRNA), as the existence of numerous other modifications and secondary structures can also cause the termination of reverse transcription, leading to a considerable number of false-positive signals. Recently, it is reported that THU, generated by NaBH4 treatment (Fig. 10(E)), leads to a certain extent of T-to-C conversion during reverse transcription. Based on this principle, an NGS method for detecting D modifications via base conversion has been developed.143 However, the conversion rate caused by THU is relatively low, and a certain degree of transcription inhibition limits the detection sensitivity and accuracy of this method.

4.5 Oxidation reaction

4.5.1 Oxidation of groups on aromatic rings.
DNA 5hmC. Mild oxidants like K2RuO4144 and KRuO4145 can selectively oxidize the hydroxymethyl group on 5hmC to an aldehyde group, converting it to 5fC (Fig. 11(A)). oxBS-seq74,75 combines this principle with BS-seq to achieve the identification and differentiation of several modifications. Additionally, the aldehyde group on 5fC is more reactive and prone to condensation reactions, resulting in the conversion of sequencing signals or coupling of functional groups. Therefore, after the oxidation of 5hmC to 5fC, the 5hmC site can be detected indirectly through the detection of 5fC. For instance, by labeling 5fC with hydroxylamine coupled to a fluorophore146 or azido-coupled 1,3-ninhydrin,144 the labeled product can be identified through fluorescence analysis or NGS. For the original 5fC present in DNA, it can be protected by hydroxylamine from participating in the coupling reaction, thereby eliminating its interference in the identification of 5hmC.
image file: d5cs00479a-f11.tif
Fig. 11 Oxidation reactions for base modification detection. (A) 5hmC is selectively oxidized to 5fC by KRuO4 or K2RuO4. (B) 5mC undergoes one-electron oxidation to generate 5fC, with polytungstate serving as a photocatalyst. (C) 5hmC is oxidized to thT by peroxytungstic acid, and the sequencing signal is read as T. (D) m2G and m22G are oxidized by photocatalysis, causing an increase in the mismatch rate at the m2G site and a decrease in the mismatch rate at the m22G site. Red highlights indicate the functional groups participating in the subsequent reaction.

DNA 5mC. A key requirement for the detection of 5mC is to selectively label the C-5 methyl group of 5mC rather than the C-5 methyl group of the T. Compared with 5mC, the redox potential of the T is as high as 0.53 V.147 Thus, single-electron transfer is more favorable at 5mC. With sodium decatungstate serving as the photocatalyst, 5mC undergoes single-electron oxidation to 5fC under illumination (Fig. 11(B)), which is subsequently detected through the specific chemical conversion of 5fC.148 Nevertheless, 5hmC can also undergo single-electron oxidation to generate 5fC. Moreover, the methyl group on T will also undergo oxidation to a certain extent at the oligonucleotide level, and a small amount of 5fC will naturally exist in DNA, which will cause some interference in the detection results.
4.5.2 Oxidation of the pyrimidine rings.
DNA 5hmC and RNA hm5C. It is well established that peroxytungstate can oxidize the carbon–carbon double bond of the allyl alcohol group.149 5hmC is specifically oxidized by peroxytungstate to form trihydroxythymidine (thT),150 due to the 5-hydroxymethyl group in 5hmC (Fig. 11(C)), which constituted the characteristic allyl alcohol group. The thT site is cleaved upon treatment with piperidine and can be identified via gel electrophoresis based on fragment size.150 Further studies indicated that thT pairs with A during DNA replication, causing a C-to-T transition, thereby enabling the identification of 5hmC modification sites at single-base resolution via NGS.151 Nevertheless, peroxytungstate oxidation is only effective in ssDNA and is significantly inhibited in dsDNA.

RNA hm5C can be effectively oxidized to thT by peroxytungstate, and then pair with A during reverse transcription to achieve C-to-T signal conversion. WO-seq,152 based on peroxytungstate oxidation, has been employed for the identification of RNA hm5C sites (Fig. 11(C)). Additionally, TAWO-seq,152 which initially utilizes TET oxidase to oxidize RNA m5C to hm5C followed by peroxytungstate oxidation sequencing, can also be utilized to detect RNA m5C.


RNA m2G and m22G. Both m2G and m22G have a lower oxidation potential than unmodified G due to the modification of the electron donating group,153 namely the methyl group. This modification increases the density of electron clouds on the purine ring, thereby enabling chemically selective oxidation.153 m22G disrupts the formation of Watson–Crick hydrogen bonds because of the presence of two methyl groups on the 2-amino group,41 thus causing significant mismatches during reverse transcription. Based on this principle, m22G can be specifically detected. In contrast, m2G does not cause mismatches during reverse transcription and cannot be directly identified using the mismatch principle.153 PhOxi-seq153 utilizes visible-light-mediated organic photoredox catalysis to selectively oxidize m22G and m2G (Fig. 11(D)). m2G is oxidized to m2-2,5-diamino-4H-imidazol-4-one (m2-Iz) and m2-8-oxoguanine (m2-OG). m2-Iz leads to a G-to-T conversion of the sequencing signal, while m2-OG causes a G-to-C conversion of the signal. After selective photooxidation, a remarkable increase in the mutation rate can be observed at the m2G site. For m22G, m22-Iz and m22-OG are generated after selective photooxidation, providing additional hydrogen bond formation sites. This enhances the reverse transcription read-through rate, and significantly reduces the mutation rate, serving as a characteristic signal for detecting m22G modification. However, the processing conditions of PhOxi-seq can also result in an increase in the mutation rate at m1G site. Therefore, m2G and m1G sites cannot be distinguished solely by PhOxi-seq, requiring the use of m1G demethylase or other methods for differentiation. It is worth noting that PhOxi-seq cannot achieve quantitative analysis because of its low conversion efficiency.

4.6 DNA/RNA site-directed cleavage

4.6.1 Hot piperidine-induced cleavage. Hot piperidine treatment leads to DNA cleavage at the 5fC modification site. Based on this principle, the cleaved band can be identified through gel electrophoresis.154 By using KRuO4 to selectively oxidize 5hmC to 5fC, the 5hmC site would also be cleaved upon hot piperidine treatment.154 However, this method is only applicable for the identification of modifications at specific sites.
4.6.2 Aniline-induced RNA fragmentation. AlkAniline-Seq132 is capable of simultaneously detecting m3C, m7G, and D modifications in RNA with single-base resolution. Under alkaline conditions (pH = 9.2), RNA m3C/m7G/D undergoes base removal reactions to generate abasic sites, leading to RNA fragmentation (Fig. 12). Subsequently, alkaline phosphatase is employed to remove all 5′ and 3′ phosphorylation groups, ensuring that all the phosphate groups required for adaptor ligation during the subsequent library construction step originated from the cleavage sites. After the removal of all interfering phosphate groups, aniline treatment triggers the cleavage of abasic sites, generating a new 5′ phosphorylation site at the N+1 position. Finally, the modification sites are identified through library construction, amplification, and sequencing. Since these three modifications derived from different parental bases, they can be easily distinguished upon read alignment. However, AlkAniline-Seq detects only a low degree of D signal because the formation of abasic sites from D is incomplete under these conditions.
image file: d5cs00479a-f12.tif
Fig. 12 AlkAniline-Seq for the analysis of RNA m3C/m7G/D.

Hydrazine can undergo nucleophilic addition reactions with U and m3C,155 and RNA is subsequently cleaved at these modification sites upon aniline treatment, generating fragments with 5′ phosphorylation at the N+1 positions. Under high salt conditions (3 M NaCl), this reaction occurs exclusively at m3C.155 Based on this principle, the HAC-seq155 method has been developed to detect m3C at single-base resolution. To enhance the accuracy of the method, different control groups need to be designed. In the control group, RNA is demethylated to prevent other RNA modifications from influencing the efficiency of reverse transcription; in the HAC group, RNA is treated with hydrazine and aniline without prior demethylation. In the DM-HAC group, RNA demethylation is carried out first, followed by hydrazine and aniline treatment. The cleavage rate of the corresponding sites is analyzed through high-throughput sequencing. At the m3C modification site, the cleavage rate of the HAC group should be significantly higher than that of the control group and the DM-HAC group.

U generates an abasic site upon hydrazine hydrate treatment,156 which can be cleaved upon aniline treatment. However, A, T, C, and Ψ do not generate abasic sites upon hydrazine treatment. HydraPsiSeq156 takes advantage of the resistance of Ψ by first cleaving RNA at the U site through a hydrazine/aniline treatment, and then quantifying Ψ by high-throughput sequencing. RNA fragmentation also occurred in m3C/m7G/D under the treatment of hydrazine/aniline, but it does not affect the readout of Ψ. Incomplete breaks in U or other modified U bases, however, can cause a certain degree of false positive signals.156

5. Functional group coupling strategies

For different modifications, orthogonal-reactive groups can be developed. Based on these groups, other functional groups can be coupled to achieve the identification of modification sites. For example, after coupling with a fluorophore, the modified sites can be identified through fluorescence analysis; after coupling with a biotin group, they can be identified by affinity enrichment using streptavidin-coated magnetic beads. In this process, smaller functional groups containing an azido or alkynyl groups can also be coupled to enhance the efficiency of the first step, followed by linking a biotin group through an efficient click reaction. Additionally, coupling bulky groups can further increase the detection sensitivity of modified bases in LC–MS or nanopore sequencing.

5.1 Coupling based on condensation reactions

5.1.1 DNA 5hmC. 5hmC possesses adjacent hydroxymethyl and amino groups, which can undergo condensation reactions with aromatic aldehydes to form 1,3-O,N-heterocycles.168 Initially, the 4-amino group in 5hmC reacts with the aldehyde group to form a Schiff base. Subsequently, the hydroxyl group acts as a nucleophile and undergoes nucleophilic addition to the C[double bond, length as m-dash]N double bond, resulting in the formation of the cyclized product (Fig. 13(A)). Fluorescence analysis of 5hmC can be accomplished when fluorophores are coupled to aromatic aldehydes. However, the labeling efficiency of this method is relatively low, not exceeding 75%.168
image file: d5cs00479a-f13.tif
Fig. 13 Condensation reactions for coupling functional groups. (A) Condensation of 5hmC with aromatic aldehydes. (B) The aldehyde group of f5C undergoes condensation reactions with primary amine derivatives, 2,3,3-trimethylindole derivatives, and 2-thioaniline. The aldehyde and amino groups of 5fC participate in condensation reactions with 2-(5-chlorobenzo[d]thiazol-2-yl)acetonitrile (CBAN) and a Wittig reagent, resulting in the formation of cyclized products. (C) Condensation of 5caC with primary amines catalyzed by EDC or 4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM). (D) A Wittig reagent capable of selectively labeling 5fU while exhibiting no reactivity toward 5fC. (E) Condensation of RNA m5C, hm5C, f5C, ca5C with bromoacetyl derivatives. Red highlights indicate the functional groups participating in the subsequent reaction.
5.1.2 DNA 5fC and RNA f5C. The aldehyde group of 5fC is highly reactive and prone to nucleophile attack.13,96,158–163,165,168,183–185 Compounds containing groups such as amine, hydroxylamine, hydrazine, o-phenylene diamine, and indole are designed to target and couple with 5fC. The aldehyde group of 5fC can undergo condensation reactions with hydroxylamine and is commonly utilized as a protective group for 5fC to safeguard it from deamination,78 reduction88 and other chemical transformations. Hydroxylamine can also be employed for fluorescent labeling159 or affinity enrichment158 of 5fC when coupled to different functional groups. Based on a similar principle, 2,3,3-trimethylindole derivatives (Fig. 13(B)) can also undergo condensation reactions with aldehyde groups.162 Besides 5fC, these reagents can also react with 5fU and abasic sites containing aldehyde groups, potentially causing some interference. RNA f5C can also undergo condensation reactions with these nucleophiles to couple functional groups. Additionally, after the condensation reaction of 2-thianiline with f5C on RNA (Fig. 13(B)), the product undergoes photoinduced cyclization and emits fluorescence, thereby achieving fluorescent labeling of f5C.165 In addition, 2-(5-chlorobenzo[d]thiazol-2-yl)acetonitrile (CBAN)163 and Wittig reagents164,166 can also condense with 5fC/f5C to form similar cyclized products (Fig. 13(B)). These reagents can be coupled with fluorescent labeling groups to realize the identification of DNA 5fC and RNA f5C.
5.1.3 DNA 5caC. The modified carboxyl group on 5caC is capable of undergoing condensation reactions with primary amines to form amide bonds under the catalysis of EDC.77 Thus, 5caC can be catalyzed by EDC to react with a nucleophile coupled with a functional group (Fig. 13(C)). Nevertheless, biotin-conjugated primary amines cannot efficiently form amide bonds with 5caC under EDC catalysis. Alternatively, 5caC can be condensed with azido-conjugated primary amines, and subsequent click reactions can be utilized to couple the biotin group.77 Additionally, the amide bond is more stable and can serve as a protective group to protect 5caC.77 Based on this principle, a bisulfite-based 5caC site identification method, CAB-seq,77 has been developed. However, the reaction efficiency of this process is relatively low, with a maximum yield of only 65%. Moreover, primary amines also tend to react with other active aldehyde sites (5fC, 5fU, abasic sites), causing interference. To address this issue, hydroxylamine can be employed to block other sites with active aldehyde groups (5fC, 5fU, abasic sites).157 Recently, 4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM) (Fig. 13(C)) has been utilized to catalyze the coupling of the carboxyl group on 5caC with primary amine derivatives.157 The 5caC site can be labeled and affinity-enriched using a disulfide-bonded primary amine and biotin probe. Although this method enables direct biotin labeling, the labeling efficiency remains relatively low (32%).
5.1.4 DNA 5fU. Due to the strong intramolecular hydrogen bond between the 4-amino group and the 5-carbonyl oxygen on DNA 5fC, as well as the electron-donating property of the 4-amino group further weakening the electrophilic property of its aldehyde group, the aldehyde group of DNA 5fC is less reactive than that of DNA 5fU,157,167,186 which lacks an intramolecular hydrogen bond. Based on this principle, a coumarin-derived phosphorus ylide selectively reacts with 5fU but not with 5fC or abasic sites (Fig. 13(D)), achieving selective fluorescent labeling of 5fU.167
5.1.5 RNA m5C. RNA m5C, hm5C, f5C and ca5C can utilize their 4-amino and N-3 atoms to form rings with bromoacetyl derivatives (Fig. 13(E)), which significantly enhances ionization efficiency and thereby improves detection sensitivity in LC–MS.187

5.2 Coupling based on addition and substitution reactions

5.2.1 DNA 5hmC. Unlike other deamination reactions of cytosine modifications, DNA 5hmC is transformed into cytosine-5-methylenesulfonate (CMS) through exo-methylene amidines during bisulfite treatment.188 In the presence of thiols like glutathione (GSH),188 sulfur-substituted 5hmC adducts are formed (Fig. 14(A)). If biotin is conjugated to the amino group of GSH and enriched by streptavidin magnetic beads, the modification sites can be detected by nanopore sequencing.169 The labeling process of 5hmC using this method is relatively straightforward, but the labeling efficiency is not high (30–65%) due to a portion of the adduct existing in the form of CMS.188
image file: d5cs00479a-f14.tif
Fig. 14 Addition or substitution reactions for coupling functional groups. (A) 5hmC reacts with bisulfite to form the adduct CMS. In the presence of thiol derivatives or sulfite derivatives, corresponding thiol addition products (left) or sulfite addition products (right) are formed, respectively. (B) The oxidation of RNA m6A by FTO results in the formation of hm6A, followed by the introduction of a reactive sulfhydryl group via DTT for biotin coupling. (C) RNA m6A generates Nm6A under NO/O2 oxidation, which is subsequently reduced by thiourea dioxide (TDO) to generate Am6A and coupled to biotin through a reactive amino group. (D) RNA i6A undergoes coupling via an ene reaction with 4-phenyl-1,2,4-triazoline-3,5-dione (PTAD). The oxidation of RNA i6A by oxoammonium cation leads to the removal of the modifying functional group to regenerate A. (E) The addition reaction of I with acryl cyanide, acrylamide derivatives, and CMC. Red highlights indicate the functional groups participating in the subsequent reaction.

Alkyl sulfinates have superior solubility and stability compared to thiols.170 If alkyl sulfinates (1 M sodium ethosulfonate) are employed instead of thiol compounds for nucleophilic substitution, and the amount of NaHSO3 added (50 mM) is decreased to minimize the formation of CMS, the labeling efficiency could be raised to 80%.170 The labeling efficiency of sulfinates coupled with biotin can also reach 67%.170

5.2.2 RNA m6A. In m6A-SEAL-seq,69 FTO is employed to oxidize the methyl group of m6A to a hydroxymethyl group (Fig. 14(B)), thereby generating hm6A. After obtaining a more active hydroxyl, through DTT-mediated thiol reaction, N6-dithiolsitolmethyladenosine (dm6A) is produced. The free sulfhydryl group can react with methanethiosulfonate (MTSEA) to couple the fluorophore or biotin group, ultimately achieving the specific labeling and detection of m6A.

Upon treatment with nitrite, m6A generates the adduct product Nm6A93 (Fig. 14(C)). Based on this principle, m6A-ORL-seq180 employs an aqueous NO solution to nitrate m6A in the presence of oxygen, and then uses thiourea dioxide (TDO) to reduce the NO group in the adduct to the reactive amino group. This reactive amino group can be labeled and enriched with an aromatic aldehyde probe coupled to biotin.

5.2.3 RNA i6A. 4-Phenyl-1,2,4-triazoline-3,5-dione (PTAD) (Fig. 14(D)), selectflour, and nitroso derivatives can undergo rapid ene ligation reactions with the isopentenyl group of RNA i6A and couple the azide group.121,174,189 Through the click reaction, the fluorophores can be further coupled to achieve the fluorescence analysis of i6A modifications.

In addition, in the presence of oxoammonium cation (Fig. 14(D)), RNA i6A undergoes oxidation to form α,β-conjugated imines. This product is prone to hydrolysis and eventually undergoes deisopentenylation to regenerate A. The removal of the isopentenyl group leads to a shift in molecular weight, which can be identified by HPLC or MS.175

5.2.4 RNA I. A cyanine171 or acrylamide173 derivative coupled to an azide group is capable of specifically labeling I (Fig. 14(E)). Subsequently, through the click reaction, biotin or fluorophore can be further coupled, enabling detection of I by RT-qPCR171 or fluorescence analysis.173 Additionally, CMC can be added to the N-1 position of I to enhance its ionization efficiency in LC–MS, thereby increasing the detection sensitivity.172

6. Limitations of chemical-assisted methods

Chemical-assisted methods have enabled the detection of nearly 20 significant nucleic acid modifications, demonstrating broader applicability compared to antibody-based methods, metabolic labeling methods, and enzyme-assisted methods. These methods have played a pivotal role in advancing the fields of epigenetics and epitranscriptomics. Nevertheless, these techniques still have several limitations, including low sensitivity for detecting low-abundance modifications, stringent reaction conditions, suboptimal reaction efficiency, high sample input requirements, and the inability to provide spatial information. These issues restrict the application of chemical-assisted methods in certain research contexts, particularly in the studies of low-abundance modifications, single-cell samples, or contexts requiring spatial information.10,19,190 Therefore, the development of more sensitive, more efficient, and comprehensive modification detection technologies remains an important direction for future research. Here, we provide a detailed analysis of the limitations of existing chemical-assisted methods to guide the advancement of new technologies. With continuous technological progress, detection technologies based on chemical-assisted methods are expected to overcome these limitations and offer a more powerful toolkit for the investigation of nucleic acid modifications.

6.1 Insufficient sensitivity for the detection of low-abundance modifications

Nucleic acid modifications typically exist at low abundance, which poses significant challenges for accurate detection. For example, 5mC, the most abundant DNA modification in mammals, accounts for approximately 1% of total DNA Cs,47 whereas Ψ, the most abundant modification in RNA, accounts for only 0.2–0.6% of all Us.59 The modification ratios of other types are even lower. Chemical-assisted methods may lack sufficient sensitivity to accurately detect these modifications, especially when the modification levels approach the detection limit. For example, in the case of m1A detection, its low abundance in RNA makes it difficult for classical chemical-assisted methods to reliably distinguish modified A from unmodified A,126 potentially leading to false-negative results. In addition, although ac4C-seq can convert ac4C to N4-acetyltetrahydrocytosine with nearly 100% conversion efficiency, the conversion products generate only 50% of the expected mutation signals during reverse transcription,124,128 thus reducing the detection sensitivity for ac4C modifications. Therefore, analytical methods with higher sensitivity and lower detection limits are needed to achieve accurate quantification of low-abundance modifications.

6.2 Stringent reaction conditions destroy the integrity of nucleic acids.

Chemical-assisted methods typically involve stringent reaction conditions, such as strong acids, strong bases, or high temperatures. These conditions may compromise the structural integrity of nucleic acid samples, thereby influencing the accuracy of detection results. For example, the detection of DNA 5mC or RNA m5C via bisulfite sequencing illustrates this issue.72,191 The fundamental principle involves using bisulfite to convert unmethylated C to U, while methylated C remains unaltered. Nevertheless, a notable limitation of this approach is the lengthy reaction time, severe DNA or RNA degradation, and incomplete C–U conversion in certain sequences. Recently, UBS-seq has been reported to employ high-concentration bisulfite reagents to accelerate the bisulfite reaction by approximately 13 times, thereby reducing DNA damage and background noise.84 Additionally, a chemically cooperative catalysis-assisted m6A sequencing method (CAM-seq) has recently been developed, enabling high-sensitivity, low-background, and low-input whole-transcriptome sequencing of m6A in RNA at single-base resolution.192 This method employs an N-nitrosation strategy based on small-molecule cooperative catalysis under mild deamination conditions. Therefore, future research should focus on the development of aqueous reaction systems at room temperature and near-neutral pH, such as the utilization of light or small molecule cooperative catalysis reactions instead of strong chemical reagents.

6.3 The chemical conversion efficiency of certain modified bases remains insufficient

Conversion efficiency is one of the critical factors in the detection of nucleic acid modifications through chemical-assisted approaches. Insufficient conversion results in incomplete modification detection, thereby influencing the accuracy and reliability of the results. In practical applications, conversion efficiency is impacted by various factors, such as the purity of the nucleic acid sample, the concentration of chemical reagents, the reaction temperature, the reaction duration, and the type and abundance of the modification. For instance, in the IMCRT tRNA-seq method for i6A sequencing analysis, the highest i6A conversion efficiency is only 40%.121 Similarly, when red-m1A-seq detects m1A, the maximum conversion efficiency reaches only 80%.126 These phenomena highlight the need for more efficient chemical-assisted strategies for these modifications. Additionally, in some chemical-assisted methods, the conversion efficiency may vary depending on the sequence context, leading to the underestimation of certain modifications. For instance, in the HAC-seq method,155 which detects m3C, the nucleophilic addition reaction between m3C and hydrazine can be affected by the local RNA structure, resulting in low conversion efficiency and potentially generating false-negative results at specific loci.

6.4 Difficulty in single-cell sequencing

In recent years, the advancement of single-cell sequencing technology has offered new avenues for the investigation of gene expression and epigenetic heterogeneity within cells. Unveiling the intercellular heterogeneity of nucleic acid modifications is essential for comprehending their biological functions and regulatory mechanisms. Nevertheless, single-cell sequencing technology still encounters challenges when applied to nucleic acid modification detection. Chemical-assisted approaches typically require substantial amounts of nucleic acid samples, which restricts their application in single-cell or low-input analysis. Researchers have attempted various bisulfite-based sequencing methods to detect DNA methylation in single cells;193 however, due to the sparsity of single-cell sequencing data and PCR errors, the quality of the data produced by these methods is unsatisfactory. Antibody-based and base editing enzyme-based methods can enable single-cell analysis of m6A modifications. Nevertheless, antibody-based methods cannot achieve single-base resolution,194 while base editing enzyme-based methods require exogenously expressed proteins and are not suitable for clinical samples.195 To date, single-cell m6A sequencing based on chemical-assisted methods has not been reported.

6.5 Difficulty in implementing cell imaging and spatial omics analysis

Spatial information plays a pivotal role in comprehending the functions and mechanisms of nucleic acid modifications. Nevertheless, chemical-assisted approaches typically involve the extraction of DNA or RNA from tissues or cells, leading to the loss of spatial information from the sample and making in situ detection infeasible. Recently, our group has developed the TadA8.20 base editing enzyme-assisted single-base resolution N6-methyladenosine RNA imaging method (TARS).196 This method specifically converts A in RNA to I through the base editing enzyme TadA8.20, while the m6A modification site remains unaltered. By combining the single-base gap-filling and ligation strategy with the fluorescence ddNTPs, the A and m6A forms of specific RNA loci in single cells are visualized and quantified. This method introduces a novel concept for base modification imaging: (1) high-efficient and specific base conversions are achieved in situ under mild conditions (below 45 °C, without strong acids or bases); (2) the base conversions generate distinct mutation signals recognizable by reverse transcriptase (such as A-to-I or C-to-U conversion); (3) base modification imaging can be accomplished using in situ imaging technology for single-base mutations of nucleic acids. Among chemical-assisted methods, the fC-CET method achieves nearly 100% f5C-to-T conversion under mild conditions,96 and the BACS method achieves approximately 87% Ψ-to-C conversion under mild conditions.113 These methods are suitable for the development of f5C and Ψ imaging methods. However, for other types of base modifications, milder chemical-assisted approaches still need to be developed.

Spatial epitranscriptomics represents a highly promising field for future research, as it is crucial for comprehending the functions and regulatory mechanisms of nucleic acid modifications in tissues or cells. For instance, specific modifications may be concentrated in specific cell types or tissue regions, and this spatial heterogeneity could be associated with specific biological functions or diseases.197 Nevertheless, spatial epitranscriptomic analysis poses challenges for chemical-assisted approaches. Currently, no high-performance method is available to achieve spatial omics analysis of base modifications. Based on the inhibitory effect of m6A loci on padlock probe ligation, MiP-Seq has been employed for spatial omic analysis of RNA m6A modifications.198 However, this method defines intracellular spots that fail to generate signals as m6A modifications, which may be influenced by background noise, leading to relatively significant errors. In situ sequencing-based spatial omics technologies, like STARmap,199 are highly compatible with nucleic acid imaging. Therefore, based on the imaging methods of base modifications, spatial epitranscriptomic technology holds great potential for development in the future.

7. Conclusions and perspectives

Nucleic acid modifications play a crucial role in diverse biological processes and are intimately associated with human diseases. These base modifications are extensively distributed within the genome and transcriptome, and the biological functions of various modifications are significantly distinct. Accurate quantification and precise localization analysis are essential for evaluating the biological significance of these modifications and uncovering disease mechanisms and therapeutic targets. These approaches offer direct answers to questions regarding the existence of specific RNA modifications and their dynamic changes during biological processes. Due to the subtle structural differences between modified bases and unmodified bases, the specific identification and detection of these base modifications encounter substantial challenges. Chemical-assisted methods leverage the regulatory effect of modifying groups on the chemical reactivity of bases and utilize specific chemical reagents for selective chemical reactions, thereby enabling the identification and comprehensive quantification of modifications. This review outlines the fundamental biological characteristics of nucleic acid modifications, including their structures, abundance, and biological functions, and describes how chemical-assisted approaches can be integrated with LC–MS, fluorescence analysis, gel electrophoresis, sequencing, and other techniques to constitute highly sensitive and highly specific analytical methods for nucleic acid modifications. We detail how diverse types of chemical reactions, including oxidation, reduction, deamination, addition, substitution, and coupling reactions, can differentiate modified bases from unmodified ones. These chemical-assisted methods are summarized in Table 2. Furthermore, we discuss the limitations of chemical-assisted methods and propose directions for future development.
Table 2 Summary of chemical-assisted strategies for epigenetic analysis
Modification Method name Chemical Reaction efficiency Reaction type Detection strategy Ref.
ac4C NaBH4 50% Reduction Sanger 123
ac4C ac4C-seq NaCNBH3 90% Reduction NGS 124 and 128
5caC 4-(4,6-Dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM)/biotinylated primary amine 32% Coupling NGS 157
5caC [Ir(dF(CF3)ppy)2(dtbbpy)]Cl 85–99% Reduction NGS 92
5caC CAB-seq EDC, primary amine 0–65% Addition Sanger 77
D Rho-seq NaBH4, rhodamine 110 chloride Reduction and substitution NGS 141
D D-seq NaBH4 Reduction NGS 142
D NaBH4 Reduction NGS 143
D NaBH4/Cy3-hydrazide or Cy5-hydrazide Reduction and substitution LC–MS 127
5fC ARP-seq ARP Coupling NGS 158
5fC fCAB-seq Et-ONH2 Coupling NGS 78
5fC Modified hydroxylamine (BODIPY-B, BODIPY-L or Coumarin-HA)/p-anisidine 83–97% Coupling FL 159
5fC Naphthalimide hydroxylamine probe ∼100% Coupling FL 160
5fC CLED-seq Biotin-ONH2 95% Coupling NGS 161
5fC/5fU 2,3,3-Trimethylindole derivatives 80% Coupling FL 162
5fC 2-(5-Chlorobenzo[d]thiazol-2-yl)acetonitrile (CBAN) ∼100% Coupling FL/NGS 163
5fC azi-BP Coupling qPCR/Sanger 13
5fC YC-CN (phosphorus ylide) 99% Coupling FL 164
5fC fC-CET 1,3-Indandione azido derivative 99% Coupling NGS 96
5fC CLEVER-seq Malononitrile 86% Coupling NGS 98
5fC Malononitrile NA Coupling qPCR 12
5fC redBS-seq NaBH4 80% Reduction NGS 76
5fC/5mC/5hmC/5caC Pyridine borane NA Deamination NGS 88
f5C 2-thioaniline 85% Coupling FL 165
f5C paC-Seq Cyanomethylene triphenylphosphorane 96% Coupling NGS 166
f5C Mal-seq Malononitrile 50–60% Coupling NGS 99
f5C/ca5C NaCNBH3 27%/33% Reduction NGS 91
f5C Pyridine borane NA Reduction NGS 90
5fU YU (phosphorus ylide) 85% Coupling FL 167
5hmC Benzaldehyde derivatives 35–75% Coupling FL 168
5hmC TAB-seq βGT/UDP-Glc NA Coupling NGS 79
5hmC oxBS-seq KRuO4 94% Oxidation NGS 74 and 75
5hmC/5fC KRuO4, piperidine NA Oxidation GE 154
5hmC Ox-Labeling KRuO4, fluorescent dyes containing amino groups 90% Oxidation and coupling FL 146
5hmC K2RuO4, azido derivative of 1,3-indandione 94% Oxidation and coupling NGS 144
5hmC CAM-seq KRuO4, azi-BP 88% Oxidation and coupling NGS 145
5hmC Peroxotungstate 79% (ssDNA)/7% (dsDNA) Oxidation GE 150
5hmC Peroxotungstate 99% (ssDNA) Oxidation NGS 151
5hmC Bisulfite, RSH ∼70% Substitution Nanopore 169
5hmC Bisulfite, RSO2Na 80% Substitution LC–MS 170
hm5C/m5C WO-seq/TAWO-seq Peroxotungstate 50–60% Oxidation NGS 152
I Nano ICE-Seq Acrylonitrile NA Addition and substitution Nanopore 119
I ICE-seq Acrylonitrile 98% Addition and substitution NGS 37 and 118
I AtoI_N3 71% Addition and coupling NGS 171
I Inosine Chemical Erasing (ICE) Acrylonitrile 90% Addition and substitution Sanger 38
I CMCT (N-cyclohexyl-N′-β-(4-methylmorpholinium)ethylcarbodiimide p-toluenesulfonate) 93.40% Addition and substitution LC–MS 172
I Acrylamidofluorescein 50% Addition and coupling FL 115
I EPhAA 60% Addition and coupling FL 173
i6A TMSN3, selectfluor 89% Addition and coupling FL 174
i6A IMCRT tRNA-seq I2, PTAD 40% Addition and substitution NGS 121
i6A Oxoammonium cation, NH2OH 90% Addition and substitution LC–MS 175
m1A red-m1A-seq NaBH4 80% Reduction NGS 126
m22G PhOxi-seq Riboflavin, selectfluor 100% Oxidation NGS 153
m3C HAC-seq N2H4, C6H5NH2 NA Cleavage NGS 155
m3C/m7G/D AlkAniline-seq Aniline NA Cleavage NGS 132
m5C Sodium bisulfite 100% Deamination NGS 72
m5C MethylC-Seq NaHSO3 NA Deamination NGS 14
m5C BS-seq NaHSO3 NA Deamination NGS 176
m5C BS-seq NaHSO3 99.80% Deamination NGS 177
m5C V2O5 NA Oxidation and cleavage GE 178
m5C Chloroacetaldehyde (CAA) 98% Addition and substitution FL 179
m5C m5C-TAC-seq TET, 1,3-indandione azido derivative, DBCO-S-S-PEG3-biotin 48% Oxidation and coupling NGS 97
5mC/m5C UBS-seq NH4HSO3, (NH4)2SO3 NGS 84
m6A NOseq NaNO2 10–50% Deamination NGS 93
m6A Nitrite-Seq NaNO2 100% Deamination NGS 94
m6A m6A-ORL-seq NO, thiourea dioxide (TDO), biotinylated aryl aldehyde probe 88% Deamination NGS 180
m6A GLORI NaNO2, glyoxal 99% Deamination NGS 95
m6A m6A-SEAL-seq Dithiothreitol, methanethiosulfonate-biotin (MTSEA-biotin) 80% Coupling NGS 69
m6A m6A-SAC-seq MjDim1, allylic-SAM,I2 75% Addition and substitution NGS 122
m7G Bo-Seq NaBH4 60% Reduction and cleavage GE 181
m7G TRAC-seq NaBH4, aniline NA Reduction NGS 125 and 133
m7G m7G-MaP-seq NaBH4 NA Reduction NGS 136
m7G m7G-seq NaBH4, biotin-hydrazide NA Reduction NGS 135
m7G BoRed-seq NaBH4, N-(aminooxyacetyl)-N′-(D-biotinoyl)hydrazine NA Reduction NGS 134
m7G m7G-quant-seq KBH4 97% Reduction NGS 137
Ψ N-Cyclohexyl-N-(2-morpholinoethyl) carbodiimide methyl-p-toluenesulfonate (CMC) NA Addition FL 106
Ψ N-Cyclohexyl-N′-(2-morpholinoethyl)carbodiimide (CMC) NA Addition GE 105
Ψ nanoCMC-seq N-Cyclohexyl-N′-(2-morpholinoethyl)carbodiimide metho-p-toluenesulphonate (CMC) NA Addition Nanopore 182
Ψ Ψ-seq N-Cyclohexyl-N′-β-(4-methylmorpholinium)ethylcarbodiimide (CMC) NA Addition NGS 101
Ψ Pseudo-seq N-Cyclohexyl-N′-(2-morpholinoethyl)carbodiimide metho-p-toluenesulphonate (CMC) NA Addition NGS 102
Ψ PSI-seq 1-Cyclohexyl-(2-morpholinoethyl)carbodiimide metho-p-toluene sulfonate (CMCT) NA Addition NGS 100
Ψ N-Cyclohexyl-N′-(2-morpholinoethyl)carbodiimide (CMC) NA Addtion qPCR 104
Ψ CeU-Seq N3-CMC, DBCO-(PEG)4-biotin NA Addition and coupling NGS 103
Ψ HydraPsiSeq Hydrazine, aniline NA Cleavage NGS 156
Ψ BID-seq Na2SO3, NaHSO3 50% Addition NGS 110
Ψ PRAISE K2SO3, NaHSO3, hydroquinone 90.60% Addition NGS 109
Ψ BIHIND-qPCR/BIHIND-seq NaHSO3 NA Addition NGS 112
Ψ pseU-TRACE Na2SO3, NaHSO3 60% Addition qPCR 111
Ψ/m1A/m5C RBS-Seq NaHSO3 50% Deamination and addition NGS 85
Ψ BACS 2-Bromoacrylamide 87.60% Addition NGS 113
Ψ Methyl vinyl sulfone (MVS) 85% Addition LC–MS 114
Ψ Acrylonitrile 50% Addition LC–MS 116


Significant progress has been achieved over the past decade in the chemical-assisted approaches for the detection of nucleic acid modifications, which has significantly advanced the progress of epigenetics and epitranscriptomics. Nevertheless, chemical-assisted methods still encounter key challenges, such as insufficient detection sensitivity, stringent reaction conditions, low conversion efficiency, and the loss of spatial information. Future development should focus on the development of milder and more efficient chemical strategies that can integrate single-cell sequencing and in situ imaging technologies, while fully leveraging space omics methods. Only by surmounting these technical bottlenecks can the precise regulatory mechanisms of nucleic acid modifications in physiological and pathological processes be fully disclosed, enabling the identification of new epigenetic targets for disease diagnosis and treatment. Additionally, nucleic acid modifications exhibit structural diversity, and new types of modifications are constantly being discovered. Although this paper mainly focused on known modifications, many newly identified modifications have emerged recently. For instance, 1, N6-dimethyladenosine (m1,6A), which is prevalent in mammalian cells, may be involved in tRNA modification-mediated translation regulation.200 Another example is 3-(3-amino-3-carboxypropyl)uridine (acp3U), a modification found in glycosylated RNAs that mediates the connection between RNA and sugar chains.201 These discoveries highlight the need to develop new methods with high precision and sensitivity for characterizing these newly discovered modifications, which is crucial for understanding their physiological functions.

With the continuous progress of technologies, it is expected that within the next five years, single-cell resolution, comprehensive modification coverage, and spatially resolved dynamic modification maps will be achieved. This will significantly boost the development of precision medicine and epigenetic therapy. It is hoped that this review stimulates the curiosity and creativity of researchers in this field and offers new directions for future exploration.

Conflicts of interest

There are no conflicts to declare.

Data availability

No primary research results, software or code have been included and no new data were generated or analysed as part of this review.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (no. 22034004, 22027807), the National Key Research and Development Program of China (no. 2021YFA1200104), the New Cornerstone Science Foundation, China Postdoctoral Science Foundation (no. BX20230179 and 2023M741974) and the Beijing Life Science Academy Initiative Scientific Research Program (no. 2023000CA0050, 2023100CC0240, 2023100CC0250 and 2024100CA0080).

References

  1. Y. Chen, T. Hong, S. Wang, J. Mo, T. Tian and X. Zhou, Chem. Soc. Rev., 2017, 46, 2844–2872 RSC .
  2. M. Frye, S. R. Jaffrey, T. Pan, G. Rechavi and T. Suzuki, Nat. Rev. Genet., 2016, 17, 365–372 CrossRef CAS PubMed .
  3. R. J. Schmitz, Z. A. Lewis and M. G. Goll, Trends Genet., 2019, 35, 818–827 CrossRef CAS PubMed .
  4. Z. D. Smith, S. Hetzel and A. Meissner, Nat. Rev. Genet., 2024, 26, 7–30 CrossRef PubMed .
  5. P. C. He and C. He, EMBO J., 2021, 40, e105977 CrossRef CAS PubMed .
  6. B. Wang, D. Shi, S. Yang, Y. Lian, H. Li, M. Cao, Y. He, L. Zhang, C. Qiu, T. Liu, W. Wen, Y. Ma, L. Shi, T. Cheng, L. Shi, W. Yuan, Y. Chu and J. Shi, Blood, 2024, 144, 657–671 CrossRef CAS PubMed .
  7. L. Liao, Y. He, S.-J. Li, X.-M. Yu, Z.-C. Liu, Y.-Y. Liang, H. Yang, J. Yang, G.-G. Zhang, C.-M. Deng, X. Wei, Y.-D. Zhu, T.-Y. Xu, C.-C. Zheng, C. Cheng, A. Li, Z.-G. Li, J.-B. Liu and B. Li, Cell Res., 2023, 33, 355–371 CrossRef CAS PubMed .
  8. J. Mo, X. Weng and X. Zhou, Acc. Chem. Res., 2023, 56, 2788–2800 CrossRef CAS PubMed .
  9. N. Plongthongkum, D. H. Diep and K. Zhang, Nat. Rev. Genet., 2014, 15, 647–661 CrossRef CAS PubMed .
  10. G. Ammann, M. Berg, J. F. Dalwigk and S. M. Kaiser, Acc. Chem. Res., 2023, 56, 3121–3131 CrossRef CAS PubMed .
  11. X. Wang, Z. Lu, A. Gomez, G. C. Hon, Y. Yue, D. Han, Y. Fu, M. Parisien, Q. Dai, G. Jia, B. Ren, T. Pan and C. He, Nature, 2013, 505, 117–120 CrossRef CAS PubMed .
  12. J. Liu, W. Yang, X. Zhang, Y. Wang and X. Zhou, Chem. Commun., 2021, 57, 13796–13798 RSC .
  13. Y. Wang, C. Liu, X. Zhang, W. Yang, F. Wu, G. Zou, X. Weng and X. Zhou, Chem. Sci., 2018, 9, 3723–3728 RSC .
  14. R. Lister, M. Pelizzola, R. H. Dowen, R. D. Hawkins, G. Hon, J. Tonti-Filippini, J. R. Nery, L. Lee, Z. Ye, Q.-M. Ngo, L. Edsall, J. Antosiewicz-Bourget, R. Stewart, V. Ruotti, A. H. Millar, J. A. Thomson, B. Ren and J. R. Ecker, Nature, 2009, 462, 315–322 CrossRef CAS PubMed .
  15. Z. Zhao, W. Yan and X. Weng, Bioorgan. Med. Chem., 2024, 111, 117838 CrossRef PubMed .
  16. Y. Wang, X. Zhang, H. Liu and X. Zhou, Chem. Soc. Rev., 2021, 50, 13481–13497 RSC .
  17. A. Okamoto, Org. Biomol. Chem., 2009, 7, 21–26 RSC .
  18. Y. Motorin and M. Helm, Acc. Chem. Res., 2023, 57, 275–288 CrossRef PubMed .
  19. Y. Kong, E. A. Mead and G. Fang, Nat. Rev. Genet., 2023, 24, 363–381 CrossRef CAS PubMed .
  20. S. Nachtergaele and C. He, Annu. Rev. Genet., 2018, 52, 349–372 CrossRef CAS PubMed .
  21. I. A. Roundtree, M. E. Evans, T. Pan and C. He, Cell, 2017, 169, 1187–1200 CrossRef CAS PubMed .
  22. E. Peer, G. Rechavi and D. Dominissini, Curr. Opin. Chem. Biol., 2017, 41, 93–98 CrossRef CAS PubMed .
  23. K. Boulias and E. L. Greer, Nat. Rev. Genet., 2022, 24, 143–160 CrossRef PubMed .
  24. D. Wiener and S. Schwartz, Nat. Rev. Genet., 2020, 22, 119–131 CrossRef PubMed .
  25. H. Sun, K. Li, C. Liu and C. Yi, Nat. Rev. Mol. Cell Biol., 2023, 24, 714–731 CrossRef CAS PubMed .
  26. B. S. Zhao, I. A. Roundtree and C. He, Nat. Rev. Mol. Cell Biol., 2016, 18, 31–42 CrossRef PubMed .
  27. W. V. Gilbert, T. A. Bell and C. Schaening, Science, 2016, 352, 1408–1412 CrossRef CAS PubMed .
  28. S. Blanco and M. Frye, Curr. Opin. Cell Biol., 2014, 31, 1–7 CrossRef CAS PubMed .
  29. I. Orsolic, A. Carrier and M. Esteller, Trends Genet., 2023, 39, 74–88 CrossRef CAS PubMed .
  30. S. Delaunay, G. Pascual, B. Feng, K. Klann, M. Behm, A. Hotz-Wagenblatt, K. Richter, K. Zaoui, E. Herpel, C. Münch, S. Dietmann, J. Hess, S. A. Benitah and M. Frye, Nature, 2022, 607, 593–603 CrossRef CAS PubMed .
  31. A. G. Arimbasseri, J. Iben, F.-Y. Wei, K. Rijal, K. Tomizawa, M. Hafner and R. J. Maraia, RNA, 2016, 22, 1400–1410 CrossRef CAS PubMed .
  32. S. D’Silva, S. J. Haider and E. M. Phizicky, RNA, 2011, 17, 1100 CrossRef PubMed .
  33. S. Sharma, J.-L. Langhendries, P. Watzinger, P. Kötter, K.-D. Entian and D. L. J. Lafontaine, Nucleic Acids Res., 2015, 43, 2242–2258 CrossRef CAS PubMed .
  34. X. Jiang, B. Liu, Z. Nie, L. Duan, Q. Xiong, Z. Jin, C. Yang and Y. Chen, Signal Transduction Targeted Ther., 2021, 6, 74 CrossRef CAS PubMed .
  35. J. Smoczynski, M.-J. Yared, V. Meynier, P. Barraud and C. Tisné, Acc. Chem. Res., 2024, 57, 429–438 CAS .
  36. T. N. Lamichhane, N. H. Blewett, A. K. Crawford, V. A. Cherkasova, J. R. Iben, T. J. Begley, P. J. Farabaugh and R. J. Maraia, Mol. Cell. Biol., 2023, 33, 2918–2929 CrossRef PubMed .
  37. M. Sakurai, H. Ueda, T. Yano, S. Okada, H. Terajima, T. Mitsuyama, A. Toyoda, A. Fujiyama, H. Kawabata and T. Suzuki, Genome Res., 2014, 24, 522–534 Search PubMed .
  38. M. Sakurai, T. Yano, H. Kawabata, H. Ueda and T. Suzuki, Nat. Chem. Biol., 2010, 6, 733–740 CrossRef CAS PubMed .
  39. Y. Furuichi, Proc. Jpn. Acad., Ser. B, 2015, 91, 394–409 CrossRef CAS PubMed .
  40. M. P. Guy and E. M. Phizicky, RNA Biol., 2015, 11, 1608–1618 CrossRef PubMed .
  41. Q. Dai, G. Zheng, M. H. Schwartz, W. C. Clark and T. Pan, Angew. Chem., Int. Ed., 2017, 56, 5017–5020 CrossRef CAS PubMed .
  42. P. V. Sergiev, A. A. Bogdanov and O. A. Dontsova, Nucleic Acids Res., 2007, 35, 2295–2301 Search PubMed .
  43. J. Karijolich, C. Yi and Y.-T. Yu, Nat. Rev. Mol. Cell Biol., 2015, 16, 581–585 CrossRef CAS PubMed .
  44. J. Dalluge, Nucleic Acids Res., 1996, 24, 1073–1079 CrossRef CAS PubMed .
  45. N. Dyubankova, E. Sochacka, K. Kraszewska, B. Nawrot, P. Herdewijn and E. Lescrinier, Org. Biomol. Chem., 2015, 13, 4960–4966 RSC .
  46. X. J. Wu and Y. Zhang, Nat. Rev. Genet., 2017, 18, 517–534 CrossRef CAS PubMed .
  47. Z. D. Smith and A. Meissner, Nat. Rev. Genet., 2013, 14, 204–220 CrossRef CAS PubMed .
  48. A. Bird, Genes Dev., 2002, 16, 6–21 CrossRef CAS PubMed .
  49. Y. Fu, D. Dominissini, G. Rechavi and C. He, Nat. Rev. Genet., 2014, 15, 293–306 CrossRef CAS PubMed .
  50. M. Safra, A. Sas-Chen, R. Nir, R. Winkler, A. Nachshon, D. Bar-Yaacov, M. Erlacher, W. Rossmanith, N. Stern-Ginossar and S. Schwartz, Nature, 2017, 551, 251–255 CrossRef CAS PubMed .
  51. D. Dominissini, S. Nachtergaele, S. Moshitch-Moshkovitz, E. Peer, N. Kol, M. S. Ben-Haim, Q. Dai, A. Di Segni, M. Salmon-Divon, W. C. Clark, G. Q. Zheng, T. Pan, O. Solomon, E. Eyal, V. Hershkovitz, D. Han, L. C. Doré, N. Amariglio, G. Rechavi and C. He, Nature, 2016, 530, 441–446 CrossRef CAS PubMed .
  52. X. Y. Li, X. S. Xiong, K. Wang, L. X. Wang, X. T. Shu, S. Q. Ma and C. Q. Yi, Nat. Chem. Biol., 2016, 12, 311–316 CrossRef CAS PubMed .
  53. X. N. Lin, B. X. Gai, L. Liu and L. Cheng, Bioorg. Med. Chem., 2024, 110, 117838 Search PubMed .
  54. W. Slotkin and K. Nishikura, Genome Med., 2013, 5, 105 CrossRef PubMed .
  55. Y. S. Chen, W. L. Yang, Y. L. Zhao and Y. G. Yang, Wiley Interdiscip. Rev.: RNA, 2021, 12, e1639 CrossRef CAS PubMed .
  56. S. M. Huber, P. van Delft, L. Mendil, M. Bachman, K. Smollett, F. Werner, E. A. Miska and S. Balasubramanian, ChemBioChem, 2015, 16, 752–755 Search PubMed .
  57. Y. Zhang, Y. Lei, Y. Dong, S. Chen, S. Sun, F. Zhou, Z. Zhao, B. Chen, L. Wei, J. Chen and Z. Meng, Pharmacol. Ther., 2024, 253, 108576 Search PubMed .
  58. G. H. Jin, M. Q. Xu, M. S. Zou and S. W. Duan, Mol. Ther.–Nucleic Acids, 2020, 20, 13–24 CrossRef CAS PubMed .
  59. J. Cerneckis, Q. Cui, C. A. He, C. Q. Yi and Y. H. Shi, Trends Pharmacol. Sci., 2022, 43, 522–535 Search PubMed .
  60. A. C. Rintala-Dempsey and U. Kothe, RNA Biol., 2017, 14, 1185–1196 Search PubMed .
  61. L. S. Zhang, C. Liu, H. H. Ma, Q. Dai, H. L. Sun, G. Z. Luo, Z. J. Zhang, L. D. Zhang, L. L. Hu, X. Y. Dong and C. He, Mol. Cell, 2019, 74, 1304–1316 Search PubMed .
  62. R. J. Jackson, C. U. T. Hellen and T. V. Pestova, Nat. Rev. Mol. Cell Biol., 2010, 11, 113–127 CrossRef CAS PubMed .
  63. L. Malbec, T. Zhang, Y. S. Chen, Y. Zhang, B. F. Sun, B. Y. Shi, Y. L. Zhao, Y. Yang and Y. G. Yang, Cell Res., 2019, 29, 927–941 CrossRef CAS PubMed .
  64. Y. Zhang, X. Zhang, J. Shi, F. Tuorto, X. Li, Y. Liu, R. Liebers, L. Zhang, Y. Qu, J. Qian, M. Pahima, Y. Liu, M. Yan, Z. Cao, X. Lei, Y. Cao, H. Peng, S. Liu, Y. Wang, H. Zheng, R. Woolsey, D. Quilici, Q. Zhai, L. Li, T. Zhou, W. Yan, F. Lyko, Y. Zhang, Q. Zhou, E. Duan and Q. Chen, Nat. Cell Biol., 2018, 20, 535–540 CrossRef CAS PubMed .
  65. L.-S. Zhang, Q. Dai and C. He, Acc. Chem. Res., 2023, 57, 47–58 CrossRef PubMed .
  66. J. Xiong, J. Wu, Y. Liu, Y.-J. Feng and B.-F. Yuan, TrAC, Trends Anal. Chem., 2024, 172, 117606 Search PubMed .
  67. Y. Zhang, L. Lu and X. Li, Exp. Mol. Med., 2022, 54, 1601–1616 CrossRef CAS PubMed .
  68. X. Li, X. Xiong and C. Yi, Nat. Methods, 2016, 14, 23–31 CrossRef PubMed .
  69. Y. Wang, Y. Xiao, S. Dong, Q. Yu and G. Jia, Nat. Chem. Biol., 2020, 16, 896–903 CrossRef CAS PubMed .
  70. S. J. Cokus, S. Feng, X. Zhang, Z. Chen, B. Merriman, C. D. Haudenschild, S. Pradhan, S. F. Nelson, M. Pellegrini and S. E. Jacobsen, Nature, 2008, 452, 215–219 CrossRef CAS PubMed .
  71. Y. Motorin, F. Lyko and M. Helm, Nucleic Acids Res., 2010, 38, 1415–1430 CrossRef CAS PubMed .
  72. M. Schaefer, T. Pollex, K. Hanna and F. Lyko, Nucleic Acids Res., 2008, 37, e12 CrossRef PubMed .
  73. Y. Huang, W. A. Pastor, Y. H. Shen, M. Tahiliani, D. R. Liu and A. Rao, PLoS One, 2010, 5, e8888 CrossRef PubMed .
  74. M. J. Booth, M. R. Branco, G. Ficz, D. Oxley, F. Krueger, W. Reik and S. Balasubramanian, Science, 2012, 336, 934–937 CrossRef CAS PubMed .
  75. M. J. Booth, T. W. B. Ost, D. Beraldi, N. M. Bell, M. R. Branco, W. Reik and S. Balasubramanian, Nat. Protoc., 2013, 8, 1841–1851 CrossRef CAS PubMed .
  76. M. J. Booth, G. Marsico, M. Bachman, D. Beraldi and S. Balasubramanian, Nat. Chem., 2014, 6, 435–440 CrossRef CAS PubMed .
  77. X. Lu, C.-X. Song, K. Szulwach, Z. Wang, P. Weidenbacher, P. Jin and C. He, J. Am. Chem. Soc., 2013, 135, 9315–9317 CrossRef CAS PubMed .
  78. C.-X. Song, Keith E. Szulwach, Q. Dai, Y. Fu, S.-Q. Mao, L. Lin, C. Street, Y. Li, M. Poidevin, H. Wu, J. Gao, P. Liu, L. Li, G.-L. Xu, P. Jin and C. He, Cell, 2013, 153, 678–691 CrossRef CAS PubMed .
  79. M. Yu, Gary C. Hon, Keith E. Szulwach, C.-X. Song, L. Zhang, A. Kim, X. Li, Q. Dai, Y. Shen, B. Park, J.-H. Min, P. Jin, B. Ren and C. He, Cell, 2012, 149, 1368–1380 CrossRef CAS PubMed .
  80. A. Adey and J. Shendure, Genome Res., 2012, 22, 1139–1143 CrossRef CAS PubMed .
  81. H. Kobayashi, T. Sakurai, F. Miura, M. Imai, K. Mochiduki, E. Yanagisawa, A. Sakashita, T. Wakai, Y. Suzuki, T. Ito, Y. Matsui and T. Kono, Genome Res., 2013, 23, 616–627 CrossRef CAS PubMed .
  82. K. Shirane, H. Toh, H. Kobayashi, F. Miura, H. Chiba, T. Ito, T. Kono and H. Sasaki, PLoS Genet., 2013, 9, 10 Search PubMed .
  83. K. Tanaka and A. Okamoto, Bioorg. Med. Chem. Lett., 2007, 17, 1912–1915 CrossRef CAS PubMed .
  84. Q. Dai, C. Ye, I. Irkliyenko, Y. Wang, H.-L. Sun, Y. Gao, Y. Liu, A. Beadell, J. Perea, A. Goel and C. He, Nat. Biotechnol., 2024, 42, 1559–1570 CrossRef CAS PubMed .
  85. V. Khoddami, A. Yerra, T. L. Mosbruger, A. M. Fleming, C. J. Burrows and B. R. Cairns, Proc. Natl. Acad. Sci. U. S. A., 2019, 116, 6784–6789 CrossRef CAS PubMed .
  86. M. Shiraishi and H. Hayatsu, DNA Res., 2004, 11, 409–415 CrossRef CAS PubMed .
  87. N. Olova, F. Krueger, S. Andrews, D. Oxley, R. V. Berrens, M. R. Branco and W. Reik, Genome Biol., 2018, 19, 19 CrossRef PubMed .
  88. Y. Liu, P. Siejka-Zielińska, G. Velikova, Y. Bi, F. Yuan, M. Tomkova, C. Bai, L. Chen, B. Schuster-Böckler and C.-X. Song, Nat. Biotechnol., 2019, 37, 424–429 CrossRef CAS PubMed .
  89. Y. Liu, Z. Hu, J. Cheng, P. Siejka-Zielińska, J. Chen, M. Inoue, A. A. Ahmed and C.-X. Song, Nat. Commun., 2021, 12, 618 CrossRef PubMed .
  90. Y. Wang, Z. Chen, X. Zhang, X. Weng, J. Deng, W. Yang, F. Wu, S. Han, C. Xia, Y. Zhou, Y. Chen and X. Zhou, ACS Chem. Biol., 2021, 17, 77–84 CrossRef PubMed .
  91. C. N. Link, S. Thalalla Gamage, D. Gallimore, R. Kopajtich, C. Evans, S. Nance, S. D. Fox, T. Andresson, R. Chari, J. Ivanic, H. Prokisch and J. L. Meier, Biochemistry, 2022, 61, 535–544 CrossRef CAS PubMed .
  92. B. J. Mortishire-Smith, S. M. Becker, A. Simeone, L. Melidis and S. Balasubramanian, J. Am. Chem. Soc., 2023, 145, 10505–10511 CrossRef CAS PubMed .
  93. S. Werner, A. Galliot, F. Pichot, T. Kemmer, V. Marchand, M. V. Sednev, T. Lence, J.-Y. Roignant, J. König, C. Höbartner, Y. Motorin, A. Hildebrandt and M. Helm, Nucleic Acids Res., 2021, 49, e23 CrossRef CAS PubMed .
  94. Y. Mahdavi-Amiri, K. Chung Kim Chung and R. Hili, Chem. Sci., 2021, 12, 606–612 RSC .
  95. C. Liu, H. Sun, Y. Yi, W. Shen, K. Li, Y. Xiao, F. Li, Y. Li, Y. Hou, B. Lu, W. Liu, H. Meng, J. Peng, C. Yi and J. Wang, Nat. Biotechnol., 2022, 41, 355–366 CrossRef PubMed .
  96. B. Xia, D. Han, X. Lu, Z. Sun, A. Zhou, Q. Yin, H. Zeng, M. Liu, X. Jiang, W. Xie, C. He and C. Yi, Nat. Methods, 2015, 12, 1047–1050 CrossRef CAS PubMed .
  97. L. Lu, X. Zhang, Y. Zhou, Z. Shi, X. Xie, X. Zhang, L. Gao, A. Fu, C. Liu, B. He, X. Xiong, Y. Yin, Q. Wang, C. Yi and X. Li, Mol. Cell, 2024, 84, 2984–3000 CrossRef CAS PubMed .
  98. C. Zhu, Y. Gao, H. Guo, B. Xia, J. Song, X. Wu, H. Zeng, K. Kee, F. Tang and C. Yi, Cell Stem Cell, 2017, 20, 720–731 CrossRef CAS PubMed .
  99. A. Li, X. Sun, A. E. Arguello and R. E. Kleiner, ACS Chem. Biol., 2022, 17, 503–508 CrossRef CAS PubMed .
  100. T. Preiss, A. F. Lovejoy, D. P. Riordan and P. O. Brown, PLoS One, 2014, 9, e110799 CrossRef PubMed .
  101. S. Schwartz, D. A. Bernstein, M. R. Mumbach, M. Jovanovic, R. H. Herbst, B. X. León-Ricardo, J. M. Engreitz, M. Guttman, R. Satija, E. S. Lander, G. Fink and A. Regev, Cell, 2014, 159, 148–162 CrossRef CAS PubMed .
  102. T. M. Carlile, M. F. Rojas-Duran, B. Zinshteyn, H. Shin, K. M. Bartoli and W. V. Gilbert, Nature, 2014, 515, 143–146 CrossRef CAS PubMed .
  103. X. Li, P. Zhu, S. Ma, J. Song, J. Bai, F. Sun and C. Yi, Nat. Chem. Biol., 2015, 11, 592–597 CrossRef CAS PubMed .
  104. Z. Lei and C. Yi, Angew. Chem., Int. Ed., 2017, 56, 14878–14882 CrossRef CAS PubMed .
  105. W. Zhang and T. Pan, Methods, 2022, 203, 1–4 CrossRef CAS PubMed .
  106. M. Sun, X. Fang, B. Lin, J. Mo, F. Wang, X. Zhou and X. Weng, Chem. Commun., 2024, 60, 4088–4091 RSC .
  107. A. M. Fleming, A. Alenko, J. P. Kitt, A. M. Orendt, P. F. Flynn, J. M. Harris and C. J. Burrows, J. Am. Chem. Soc., 2019, 141, 16450–16460 CrossRef CAS PubMed .
  108. R. Maylor, J. B. Gill and D. C. Goodall, J. Chem. Soc., Dalton Trans., 1972, 18, 2001–2006 RSC .
  109. M. Zhang, Z. Jiang, Y. Ma, W. Liu, Y. Zhuang, B. Lu, K. Li, J. Peng and C. Yi, Nat. Chem. Biol., 2023, 19, 1185–1195 CrossRef CAS PubMed .
  110. Q. Dai, L.-S. Zhang, H.-L. Sun, K. Pajdzik, L. Yang, C. Ye, C.-W. Ju, S. Liu, Y. Wang, Z. Zheng, L. Zhang, B. T. Harada, X. Dou, I. Irkliyenko, X. Feng, W. Zhang, T. Pan and C. He, Nat. Biotechnol., 2022, 41, 344–354 CrossRef PubMed .
  111. X. Fang, R. Zhao, Y. Wang, M. Sun, J. Xu, S. Long, J. Mo, H. Liu, X. Li, F. Wang, X. Zhou and X. Weng, Nucleic Acids Res., 2024, 52, e49 CrossRef CAS PubMed .
  112. Y. Zhao, X. Ma, C. Ye, W. Li, K. Pajdzik, Q. Dai, H.-L. Sun and C. He, ACS Chem. Biol., 2024, 19, 1813–1819 CrossRef CAS PubMed .
  113. H. Xu, L. Kong, J. Cheng, K. Al Moussawi, X. Chen, A. Iqbal, P. A. C. Wing, J. M. Harris, S. Tsukuda, A. Embarc-Buh, G. Wei, A. Castello, S. Kriaucionis, J. A. McKeating, X. Lu and C.-X. Song, Nat. Methods, 2024, 21, 2024–2033 CrossRef CAS PubMed .
  114. G. Emmerechts, P. Herdewijn and J. Rozenski, J. Chromatogr. B: Anal. Technol. Biomed. Life Sci., 2005, 825, 233–238 CrossRef CAS PubMed .
  115. S. D. Knutson, T. M. Ayele and J. M. Heemstra, Bioconjugate Chem., 2018, 29, 2899–2903 CrossRef CAS PubMed .
  116. J. Mengel-Jorgensen, Nucleic Acids Res., 2002, 30, 135e Search PubMed .
  117. M. Yoshida and T. Ukita, Biochim. Biophys. Acta, 1968, 157, 455 CrossRef CAS PubMed .
  118. T. Suzuki, H. Ueda, S. Okada and M. Sakurai, Nat. Protoc., 2015, 10, 715–732 Search PubMed .
  119. S. Ramasamy, V. J. Sahayasheela, S. Sharma, Z. Yu, T. Hidaka, L. Cai, V. Thangavel, H. Sugiyama and G. N. Pandian, ACS Chem. Biol., 2022, 17, 2704–2709 Search PubMed .
  120. J. J. Fox, I. Wempen, A. Hampton and I. L. Doerr, J. Am. Chem. Soc., 1958, 80, 1669–1675 CrossRef CAS .
  121. Y. Li, H. Zhou, S. Chen, Y. Li, Y. Guo, X. Chen, S. Wang, L. Wang, Y. Gan, S. Zhang, Ya. Y. Zheng, J. Sheng, Z. Zhou and R. Wang, Nucleic Acids Res., 2024, 52, 2808–2820 Search PubMed .
  122. L. Hu, S. Liu, Y. Peng, R. Ge, R. Su, C. Senevirathne, B. T. Harada, Q. Dai, J. Wei, L. Zhang, Z. Hao, L. Luo, H. Wang, Y. Wang, M. Luo, M. Chen, J. Chen and C. He, Nat. Biotechnol., 2022, 40, 1210–1219 CrossRef CAS PubMed .
  123. J. M. Thomas, C. A. Briney, K. D. Nance, J. E. Lopez, A. L. Thorpe, S. D. Fox, M.-L. Bortolin-Cavaille, A. Sas-Chen, D. Arango, S. Oberdoerffer, J. Cavaille, T. Andresson and J. L. Meier, J. Am. Chem. Soc., 2018, 140, 12667–12670 CrossRef CAS PubMed .
  124. A. Sas-Chen, J. M. Thomas, D. Matzov, M. Taoka, K. D. Nance, R. Nir, K. M. Bryson, R. Shachar, G. L. S. Liman, B. W. Burkhart, S. T. Gamage, Y. Nobe, C. A. Briney, M. J. Levy, R. T. Fuchs, G. B. Robb, J. Hartmann, S. Sharma, Q. Lin, L. Florens, M. P. Washburn, T. Isobe, T. J. Santangelo, M. Shalev-Benami, J. L. Meier and S. Schwartz, Nature, 2020, 583, 638–643 CrossRef CAS PubMed .
  125. S. Lin, Q. Liu, V. S. Lelyveld, J. Choe, J. W. Szostak and R. I. Gregory, Mol. Cell, 2018, 71, 244–255 CrossRef CAS PubMed .
  126. K. Pajdzik, R. Lyu, X. Dou, C. Ye, L.-S. Zhang, Q. Dai and C. He, RNA, 2024, 30, 548–559 CrossRef CAS PubMed .
  127. J. Kaur, M. Raj and B. S. Cooperman, RNA, 2011, 17, 1393–1400 CrossRef CAS PubMed .
  128. S. Thalalla Gamage, A. Sas-Chen, S. Schwartz and J. L. Meier, Nat. Protoc., 2021, 16, 2286–2307 CrossRef CAS PubMed .
  129. J. B. Macon and R. Wolfenden, Biochemistry, 1968, 7, 3453–3458 CrossRef CAS PubMed .
  130. A. E. Cozen, E. Quartley, A. D. Holmes, E. Hrabeta-Robinson, E. M. Phizicky and T. M. Lowe, Nat. Methods, 2015, 12, 879–884 CrossRef CAS PubMed .
  131. W. Wintermeyer and H. G. Zachau, FEBS Lett., 1975, 58, 306–309 CrossRef CAS PubMed .
  132. V. Marchand, L. Ayadi, F. G. M. Ernst, J. Hertler, V. Bourguignon-Igel, A. Galvanin, A. Kotter, M. Helm, D. L. J. Lafontaine and Y. Motorin, Angew. Chem., Int. Ed., 2018, 57, 16785–16790 CrossRef CAS PubMed .
  133. S. Lin, Q. Liu, Y.-Z. Jiang and R. I. Gregory, Nat. Protoc., 2019, 14, 3220–3242 CrossRef CAS PubMed .
  134. L. Pandolfini, I. Barbieri, A. J. Bannister, A. Hendrick, B. Andrews, N. Webster, P. Murat, P. Mach, R. Brandi, S. C. Robson, V. Migliori, A. Alendar, M. d’Onofrio, S. Balasubramanian and T. Kouzarides, Mol. Cell, 2019, 74, 1278–1290 CrossRef CAS PubMed .
  135. L.-S. Zhang, C. Liu, H. Ma, Q. Dai, H.-L. Sun, G. Luo, Z. Zhang, L. Zhang, L. Hu, X. Dong and C. He, Mol. Cell, 2019, 74, 1304–1316 CrossRef CAS PubMed .
  136. C. Enroth, L. D. Poulsen, S. Iversen, F. Kirpekar, A. Albrechtsen and J. Vinther, Nucleic Acids Res., 2019, 47, e126 CrossRef CAS PubMed .
  137. L.-S. Zhang, C.-W. Ju, C. Liu, J. Wei, Q. Dai, L. Chen, C. Ye and C. He, ACS Chem. Biol., 2022, 17, 3306–3312 CrossRef CAS PubMed .
  138. F. Xing, S. L. Hiley, T. R. Hughes and E. M. Phizicky, J. Biol. Chem., 2004, 279, 17850–17860 CrossRef CAS PubMed .
  139. A. R. Hanze, J. Am. Chem. Soc., 1967, 89, 6720–6725 CrossRef CAS PubMed .
  140. P. Cerutti and N. Miller, J. Mol. Biol., 1967, 26, 55–60 CrossRef CAS PubMed .
  141. O. Finet, C. Yague-Sanz, L. K. Krüger, P. Tran, V. Migeot, M. Louski, A. Nevers, M. Rougemaille, J. Sun, F. G. M. Ernst, L. Wacheul, M. Wery, A. Morillon, P. Dedon, D. L. J. Lafontaine and D. Hermand, Mol. Cell, 2022, 82, 404–419 CrossRef CAS PubMed .
  142. J. Coller, A. S. Draycott, C. Schaening-Burgos, M. F. Rojas-Duran, L. Wilson, L. Schärfen, K. M. Neugebauer, S. Nachtergaele and W. V. Gilbert, PLoS Biol., 2022, 20, e3001622 CrossRef PubMed .
  143. N. J. Yu, W. Dai, A. Li, M. He and R. E. Kleiner, bioRxiv, 2023, preprint DOI:10.1101/2023.11.03.565399.
  144. H. Zeng, B. He, B. Xia, D. Bai, X. Lu, J. Cai, L. Chen, A. Zhou, C. Zhu, H. Meng, Y. Gao, H. Guo, C. He, Q. Dai and C. Yi, J. Am. Chem. Soc., 2018, 140, 13190–13194 Search PubMed .
  145. Y. Wang, X. Zhang, F. Wu, Z. Chen and X. Zhou, Chem. Sci., 2019, 10, 447–452 Search PubMed .
  146. J. Hu, Y. Chen, X. Xu, F. Wu, X. Xing, Z. Xu, J. Xu, X. Weng and X. Zhou, Bioorg. Med. Chem. Lett., 2014, 24, 294–297 CrossRef CAS PubMed .
  147. M. M. Simpson, C. C. Lam, J. M. Goodman and S. Balasubramanian, Angew. Chem., Int. Ed., 2023, 62, e202304756 Search PubMed .
  148. T. Yan, Y. Chen, B. Mortishire-Smith, A. Simeone, A. Hofer and S. Balasubramanian, Angew. Chem., Int. Ed., 2024, 64, e202413593 CrossRef PubMed .
  149. Z. Raciszewski, J. Am. Chem. Soc., 1960, 82, 1267–1277 CrossRef CAS .
  150. A. Okamoto, K. Sugizaki, A. Nakamura, H. Yanagisawa and S. Ikeda, Chem. Commun., 2011, 47, 11231–11233 RSC .
  151. G. Hayashi, K. Koyama, H. Shiota, A. Kamio, T. Umeda, G. Nagae, H. Aburatani and A. Okamoto, J. Am. Chem. Soc., 2016, 138, 14178–14181 Search PubMed .
  152. F. Yuan, Y. Bi, P. Siejka-Zielinska, Y.-L. Zhou, X.-X. Zhang and C.-X. Song, Chem. Commun., 2019, 55, 2328–2331 RSC .
  153. K. Chung Kim Chung, Y. Mahdavi-Amiri, C. Korfmann and R. Hili, J. Am. Chem. Soc., 2022, 144, 5723–5727 CrossRef CAS PubMed .
  154. W. Mao, J. Hu, T. Hong, X. Xing, S. Wang, X. Chen and X. Zhou, Org. Biomol. Chem., 2013, 11, 3568–3572 RSC .
  155. J. Cui, Q. Liu, E. Sendinc, Y. Shi and R. I. Gregory, Nucleic Acids Res., 2021, 49, e27 CrossRef CAS PubMed .
  156. V. Marchand, F. Pichot, P. Neybecker, L. Ayadi, V. Bourguignon-Igel, L. Wacheul, D. L. J. Lafontaine, A. Pinzano, M. Helm and Y. Motorin, Nucleic Acids Res., 2020, 48, e110 CrossRef CAS PubMed .
  157. Y. Xie, Y. Wang, Z. He, W. Yang, B. Fu, G. Zou, X. Zhang, J. Huang and X. Zhou, Anal. Chem., 2020, 92, 12710–12715 CrossRef CAS PubMed .
  158. E.-A. Raiber, D. Beraldi, G. Ficz, H. E. Burgess, M. R. Branco, P. Murat, D. Oxley, M. J. Booth, W. Reik and S. Balasubramanian, Genome Biol., 2012, 13, 69 Search PubMed .
  159. P. Guo, S. Yan, J. Hu, X. Xing, C. Wang, X. Xu, X. Qiu, W. Ma, C. Lu, X. Weng and X. Zhou, Org. Lett., 2013, 15, 3266–3269 CrossRef CAS PubMed .
  160. C. Liu, X. Luo, Y. Chen, F. Wu, W. Yang, Y. Wang, X. Zhang, G. Zou and X. Zhou, Anal. Chem., 2018, 90, 14616–14621 CrossRef CAS PubMed .
  161. J.-H. Ding, G. Li, J. Xiong, F.-L. Liu, N.-B. Xie, T.-T. Ji, M. Wang, X. Guo, Y.-Q. Feng, W. Ci and B.-F. Yuan, Anal. Chem., 2024, 96, 4726–4735 CrossRef CAS PubMed .
  162. B. Samanta, J. Seikowski and C. Höbartner, Angew. Chem., Int. Ed., 2015, 55, 1912–1916 CrossRef PubMed .
  163. C. Liu, Y. Wang, W. Yang, F. Wu, W. Zeng, Z. Chen, J. Huang, G. Zou, X. Zhang, S. Wang, X. Weng, Z. Wu, Y. Zhou and X. Zhou, Chem. Sci., 2017, 8, 7443–7447 RSC .
  164. Q. Zhou, K. Li, L.-L. Li, K.-K. Yu, H. Zhang, L. Shi, H. Chen and X.-Q. Yu, Anal. Chem., 2019, 91, 9366–9370 CrossRef CAS PubMed .
  165. R. L. Wang, X. Y. Jin, D. L. Kong, Z. G. Chen, J. Liu, L. Liu and L. Cheng, Adv. Synth. Catal., 2019, 361, 5406–5411 CrossRef CAS .
  166. X. Y. Jin, Z. R. Huang, L. J. Xie, L. Liu, D. L. Han and L. Cheng, Angew. Chem., Int. Ed., 2022, 61, e202210652 CrossRef CAS PubMed .
  167. Q. Zhou, K. Li, Y.-H. Liu, L.-L. Li, K.-K. Yu, H. Zhang and X.-Q. Yu, Chem. Commun., 2018, 54, 13722–13725 RSC .
  168. S. Yan, X. Xu, P. Guo, J. Hu, C. Wang, R. Huang, X. Weng, Y. Du and X. Zhou, RSC Adv., 2013, 3, 12066–12068 RSC .
  169. M. A. Eckert, P. Q. Vu, K. Zhang, D. Kang, M. M. Ali, C. Xu and W. Zhao, Theranostics, 2013, 3, 583–594 CrossRef PubMed .
  170. Q. Wu, S. M. Amrutkar and F. Shao, Bioconjugate Chem., 2018, 29, 245–249 CrossRef CAS PubMed .
  171. Y. Li, M. Göhl, K. Ke, C. D. Vanderwal and R. C. Spitale, Org. Lett., 2019, 21, 7948–7951 CrossRef CAS PubMed .
  172. W.-B. Tao, N.-B. Xie, Q.-Y. Cheng, Y.-Q. Feng and B.-F. Yuan, Chin. Chem. Lett., 2023, 34, 108243 CrossRef CAS .
  173. S. D. Knutson, M. M. Korn, R. P. Johnson, L. R. Monteleone, D. M. Dailey, C. S. Swenson, P. A. Beal and J. M. Heemstra, Chem. – Eur. J., 2020, 26, 9874–9878 CrossRef CAS PubMed .
  174. S. Wang, Y. Li, Y. Gan, H. Zhou and R. Wang, Tetrahedron Lett., 2022, 100, 153873 CrossRef CAS .
  175. H. P. Cheng, X. H. Yang, L. Lan, L. J. Xie, C. Chen, C. Liu, J. Chu, Z. Y. Li, L. Liu, T. Q. Zhang, D. Q. Luo and L. Cheng, Angew. Chem., Int. Ed., 2020, 59, 10645–10650 CrossRef CAS PubMed .
  176. S. Hussain, J. Aleksic, S. Blanco, S. Dietmann and M. Frye, Genome Biol., 2013, 14, 215 CrossRef PubMed .
  177. T. Huang, W. Chen, J. Liu, N. Gu and R. Zhang, Nat. Struct. Mol. Biol., 2019, 26, 380–388 CrossRef CAS PubMed .
  178. S. Bareyt and T. Carell, Angew. Chem., Int. Ed., 2007, 47, 181–184 CrossRef PubMed .
  179. M. Giel-Pietraszuk, M. Insińska-Rak, A. Golczak, M. Sikorski, M. Barciszewska and J. Barciszewski, Acta Biochim. Pol., 2015, 62, 281–286 CrossRef CAS PubMed .
  180. Y. Xie, S. Han, Q. Li, Z. Fang, W. Yang, Q. Wei, Y. Wang, Y. Zhou, X. Weng and X. Zhou, Chem. Sci., 2022, 13, 12149–12157 RSC .
  181. S. D’Ambrosi, R. García-Vílchez, D. Kedra, P. Vitali, N. Macias-Cámara, L. Bárcena, M. Gonzalez-Lopez, A. M. Aransay, S. Dietmann, A. Hurtado and S. Blanco, RNA Biol., 2024, 21, 476–493 CrossRef PubMed .
  182. O. Begik, M. C. Lucas, L. P. Pryszcz, J. M. Ramirez, R. Medina, I. Milenkovic, S. Cruciani, H. Liu, H. G. S. Vieira, A. Sas-Chen, J. S. Mattick, S. Schwartz and E. M. Novoa, Nat. Biotechnol., 2021, 39, 1278–1291 CrossRef CAS PubMed .
  183. L. Xu, Y. C. Chen, J. Chong, A. Fin, L. S. McCoy, J. Xu, C. Zhang and D. Wang, Angew. Chem., Int. Ed., 2014, 53, 11223–11227 CrossRef CAS PubMed .
  184. J. L. Hu, X. W. Xing, X. W. Xu, F. Wu, P. Guo, S. Y. Yan, Z. H. Xu, J. H. Xu, X. C. Weng and X. Zhou, Chem. – Eur. J., 2013, 19, 5836–5840 CrossRef CAS PubMed .
  185. L. Xu, Y. C. Chen, S. Nakajima, J. Chong, L. F. Wang, L. Lan, C. Zhang and D. Wang, Chem. Sci., 2014, 5, 567–574 RSC .
  186. W. Hirose, K. Sato and A. Matsuda, Angew. Chem., Int. Ed., 2010, 49, 8392–8394 CrossRef CAS PubMed .
  187. W. Huang, M.-D. Lan, C.-B. Qi, S.-J. Zheng, S.-Z. Wei, B.-F. Yuan and Y.-Q. Feng, Chem. Sci., 2016, 7, 5495–5502 RSC .
  188. W. W. Li, L. Z. Gong and H. Bayley, Angew. Chem., Int. Ed., 2013, 52, 4350–4355 CrossRef CAS PubMed .
  189. J. Zhang, Y. Li, S. Wang and R. Wang, Tetrahedron Lett., 2021, 74, 153162 CrossRef CAS .
  190. S. Oberdoerffer and W. V. Gilbert, Nat. Rev. Mol. Cell Biol., 2024, 26, 237–248 CrossRef PubMed .
  191. K. Tanaka, K. Tainaka, T. Kamei and A. Okamoto, J. Am. Chem. Soc., 2007, 129, 5612–5620 CrossRef CAS PubMed .
  192. P. Wang, C. Ye, M. Zhao, B. Jiang and C. He, Nat. Chem., 2025 DOI:10.1038/s41557-025-01801-3 .
  193. J. Ahn, S. Heo, J. Lee and D. Bang, Biomolecules, 2021, 11, 1013 CrossRef CAS PubMed .
  194. Y. Li, Y. Wang, M. Vera-Rodriguez, L. C. Lindeman, L. E. Skuggen, E. M. K. Rasmussen, I. Jermstad, S. Khan, M. Fosslie, T. Skuland, M. Indahl, S. Khodeer, E. K. Klemsdal, K.-X. Jin, K. T. Dalen, P. Fedorcsak, G. D. Greggains, M. Lerdrup, A. Klungland, K. F. Au and J. A. Dahl, Nat. Biotechnol., 2023, 42, 591–596 CrossRef PubMed .
  195. M. Tegowski, A. K. Prater, C. L. Holley and K. D. Meyer, Nat. Neurosci., 2024, 27, 2512–2520 CrossRef CAS PubMed .
  196. Q. Zhang, Y. Dai, X. Teng and J. Li, Angew. Chem., Int. Ed., 2024, 64, e202420977 CrossRef PubMed .
  197. A. Rao, D. Barkley, G. S. França and I. Yanai, Nature, 2021, 596, 211–220 CrossRef CAS PubMed .
  198. X. Wu, W. Xu, L. Deng, Y. Li, Z. Wang, L. Sun, A. Gao, H. Wang, X. Yang, C. Wu, Y. Zou, K. Yan, Z. Liu, L. Zhang, G. Du, L. Yang, D. Lin, J. Yue, P. Wang, Y. Han, Z. Fu, J. Dai and G. Cao, Nat. Biomed. Eng., 2024, 8, 872–889 CrossRef CAS PubMed .
  199. X. Wang, W. E. Allen, M. A. Wright, E. L. Sylwestrak, N. Samusik, S. Vesuna, K. Evans, C. Liu, C. Ramakrishnan, J. Liu, G. P. Nolan, F.-A. Bava and K. Deisseroth, Science, 2018, 361, eaat5691 CrossRef PubMed .
  200. X.-J. You, S. Zhang, J.-J. Chen, F. Tang, J. He, J. Wang, C.-B. Qi, Y.-Q. Feng and B.-F. Yuan, Nucleic Acids Res., 2022, 50, 9858–9872 CrossRef CAS PubMed .
  201. Y. Xie, P. Chai, N. A. Till, H. Hemberger, C. G. Lebedenko, J. Porat, C. P. Watkins, R. M. Caldwell, B. M. George, J. Perr, C. R. Bertozzi, B. A. Garcia and R. A. Flynn, Cell, 2024, 187, 5228–5237 CrossRef CAS PubMed .

Footnote

These authors contributed equally to the manuscript.

This journal is © The Royal Society of Chemistry 2025
Click here to see how this site uses Cookies. View our privacy policy here.