Poster Presentation 44th Lorne Genome Conference 2023

A deep learning model for accurate identification of RNA modification sites (#145)

Korawich Uthayopas 1 2 3 , Alex G. C. de Sá 1 2 3 4 , David B. Ascher 1 2 3 4
  1. Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia
  2. Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
  3. School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane , Queensland, 4072 , Australia
  4. Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville, Victoria, 3010, Australia

RNA modifications are post-transcriptional events in which RNA-binding proteins alter a particular nucleotide in a transcribed RNA strand, affecting its activity, location, and stability1. Up-to-date, more than 100 types of RNA modifications have been identified, with some even linked to the development of cancers, cardiovascular disorders, and other diseases2-4. There have been huge technological advances in recent years that have radically expanded our ability to detect these modifications5, however most analysis pipelines are inherently restricted to known modification motifs. In this study, we develop a deep learning framework for the accurate identification of RNA sites likely to undergo any of seven different modification types, including N6-methyladenosine (m6A), Pseudouridine (ψ), 1-Methyladenosine (m1A), 2’-O-methyladenosine (Am), 2’-O-methylcytidine (Cm), 2'-O-methylguanosine (Gm), and 2’-O-methyluridine (Um). Data were curated from publicly available experimental datasets6, and represented using one-hot encoding of sequences, chemical and conservation properties7, and optimised versions of transformer-based machine learning techniques developed for natural language processing (RNABERT)8. Across cross-validation and independent blind tests the model performed strongly, providing a powerful base for better understanding RNA modification sites, and for use in genome wide predictive mapping.

  1. Frye, M., et al., RNA modifications: what have we learned and where are we headed? Nat Rev Genet, 2016. 17(6): p. 365-72.
  2. Cayir, A., RNA modifications as emerging therapeutic targets. Wiley Interdiscip Rev RNA, 2022. 13(4): p. e1702.
  3. Yanas, A. and K.F. Liu, RNA modifications and the link to human disease. Methods Enzymol, 2019. 626: p. 133-146.
  4. Cui, L., et al., RNA modifications: importance in immune cell biology and related diseases. Signal Transduct Target Ther, 2022. 7(1): p. 334.
  5. Zhang, Y., L. Lu, and X. Li, Detection technologies for RNA modifications. Exp Mol Med, 2022. 54(10): p. 1601-1616.
  6. Xuan, J.J., et al., RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res, 2018. 46(D1): p. D327-D334.
  7. Chen, Z., et al., iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res, 2021. 49(10): p. e60.
  8. Akiyama, M. and Y. Sakakibara, Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform, 2022. 4(1): p. lqac012.