Linking RNA Processing and Function

  1. Ling-Ling Chen1,2
  1. 1State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
  2. 2School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
  1. Correspondence: linglingchen{at}sibcb.ac.cn

Abstract

RNA processing is critical for eukaryotic mRNA maturation and function. It appears there is no exception for other types of RNAs. Long noncoding RNAs (lncRNAs) represent a subclass of noncoding RNAs, have sizes of >200 nucleotides (nt), and participate in various aspects of gene regulation. Although many lncRNAs are capped, polyadenylated, and spliced just like mRNAs, others are derived from primary transcripts of RNA polymerase II and stabilized by forming circular structures or by ending with small nucleolar RNA–protein complexes. Here we summarize the recent progress in linking the processing and function of these unconventionally processed lncRNAs; we also discuss how directional RNA movement is achieved using the radial flux movement of nascent precursor ribosomal RNA (pre-rRNA) in the human nucleolus as an example.

Surprisingly, the human genome is pervasively transcribed (>80%), and >98% of this transcriptional output represents non-protein-coding RNAs (ncRNAs) (Derrien et al. 2012; Mudge et al. 2013). Long ncRNAs (lncRNAs), which are longer than 200 nucleotides (nt) and lack protein-coding potential, have emerged as a major class of eukaryotic regulatory transcripts involved in multiple layers of gene expression. Statistics from Human GENCODE Release version 28 suggest that the human genome contains more than 16,000 lncRNA genes, but other estimates for the number of lncRNA genes exceed 100,000 in humans (Zhao et al. 2016).

Over the past decades, the study of lncRNA biogenesis and regulation has greatly improved our understanding of the overall regulatory RNA diversity and function, ranging from the classical mRNA-like lncRNAs (e.g., Xist [Engreitz et al. 2013; Simon et al. 2013; Chu et al. 2015; McHugh et al. 2015; Minajigi et al. 2015; Chen et al. 2016; Creamer and Lawrence 2017], CCAT1-L [Xiang et al. 2014], NORAD [Lee et al. 2016; Tichon et al. 2016, 2018; Munschauer et al. 2018]) to other types of unconventionally formatted linear (e.g., NEAT1 [Clemson et al. 2009; Sunwoo et al. 2009; Souquere et al. 2010; Adriaens et al. 2016; Wang et al. 2018], MALAT1 [Wilusz et al. 2008; Tripathi et al. 2010; Nakagawa et al. 2012; Zhang et al. 2012; Arun et al. 2016; Malakar et al. 2017], sno-lncRNAs [Yin et al. 2012; Xing et al. 2017], SPAs [Wu et al. 2016; Lykke-Andersen et al. 2018]) and circular RNAs (Salzman et al. 2012; Hansen et al. 2013; Jeck et al. 2013; Memczak et al. 2013; Zhang et al. 2013). Compared to microRNAs, which are ∼22-nt RNAs that mainly direct post-transcriptional repression of mRNA targets in eukaryotes (Bartel 2018), lncRNAs exhibit a surprisingly wide range of sizes, shapes, and functions (Wu et al. 2017; Kopp and Mendell 2018; Uszczynska-Ratajczak et al. 2018; Carlevaro-Fita and Johnson 2019; Yao et al. 2019a). It is now well-known that lncRNAs participate in the regulation of genetic flow of protein expression from chromatin organization to transcription regulation in the nucleus to modulation of mRNA stability, translation, and post-translation in the cytoplasm. These diverse functional potentials depend on the processing, subcellular localization, and formation of structural modules of individual lncRNAs to partner with associated proteins, which may undergo rapid changes depending on local or cellular environments.

Most annotated lncRNAs transcribed from genomic intergenic regions (lincRNAs) by RNA polymerase II (Pol II) are capped, polyadenylated, and spliced just like mRNAs (Cabili et al. 2011; Derrien et al. 2012; Djebali et al. 2012; Quinn and Chang 2016). Although exhibiting tissue- or cell type–specific expression (Cabili et al. 2011; Goff and Rinn 2015; Ulitsky 2016), they differ from mRNAs by being less evolutionarily conserved and less abundant and containing fewer exons. lincRNAs are generally more nuclear localized than their mRNA counterparts (Derrien et al. 2012; Cabili et al. 2015), in part because some lincRNAs are transcribed by deregulated Pol II, weakly spliced (Mele et al. 2017), inefficiently polyadenylated, and degraded by the nuclear exosome on chromatin (Schlackow et al. 2017). Thus, in principle, functional lincRNAs must escape this targeted nuclear surveillance process to accumulate to high levels in specific cell types.

In addition to lincRNAs, recent studies have uncovered other types of lncRNAs that are processed from primary Pol II transcripts and are stabilized by distinct mechanisms. In this review, we briefly summarize our current understanding of the biogenesis of these unconventionally processed lncRNAs and emphasize the importance of linking their processing to functions in innate immunity and in the regulation of structure and function of human nuclear subdomains. To be functional, these different types of lncRNAs need to be translocated to different subcellular compartmentations. Although mechanisms of RNA trafficking remain incompletely understood, we discuss some insights into this question using the radial flux nature of nascent precursor ribosomal RNA (pre-rRNA) trafficking in the nucleolus as a model.

MALAT1, NEAT1, AND NONPOLYADENYLATED TRANSCRIPTOMES

Earlier studies from the Spector laboratory reported alternative 3′-end processing of MALAT1 (Wilusz et al. 2008) and NEAT1_long (the long isoform of NEAT1 [Sunwoo et al. 2009]), which are two abundant nuclear-enriched lncRNAs that are localized to nuclear speckles and paraspeckles, respectively (Hutchinson et al. 2007; Clemson et al. 2009; Sasaki et al. 2009; Sunwoo et al. 2009). These RNAs are processed at their 3′ ends by recognition and cleavage of tRNA-like structures by RNase P (which is best known to process the 5′ ends of tRNAs) (Wilusz et al. 2008; Sunwoo et al. 2009). RNase P cleavage leads to the formation of mature 3′ ends, which are subsequently protected by a conserved stable U-A·U triple-helical RNA structure (· denotes the Hoogsteen face and - denotes the Watson–Crick face) (Wilusz et al. 2012; Brown et al. 2012). A similar triple-helical structure, but not formed by RNase P processing and called a nuclear retention element (NRE), has also been found at the 3′ end of the PAN lncRNA, which is expressed from Kaposi's sarcoma–associated herpesvirus (KSHV), and in RNAs from other viruses (Mitton-Fry et al. 2010; Tycowski et al. 2012). In addition to RNase P–mediated 3′ processing of lncRNAs, a group of lncRNA transcripts containing miRNAs (lnc-pri-miRNAs) use Microprocessor cleavage to terminate transcription, resulting in unstable lncRNAs without 3′-end poly(A) tails (Dhir et al. 2015).

Inspired by alternative 3′-end processing of MALAT1 and NEAT1, we began to explore the nonpolyadenylated (poly(A)) transcriptomes in human cells (Yang et al. 2011). In this method, total RNAs collected from human cells were incubated with oligo(dT) beads to select poly(A) RNAs. The unbound, flow-through RNAs from the oligo(dT) beads were collected and subjected to multiple rounds of rRNA depletion. Both poly(A) RNAs and poly(A)-/ribo-RNAs were then subjected to RNA-seq library preparation and sequencing (Yang et al. 2011). In addition to the replication-dependent histone mRNAs, this non-poly(A) RNA-seq unexpectedly identified hundreds of abundant RNA signals that did not align fully to annotated genes but derived from either introns and exons, which were termed as “excised introns” and “excised exons” (Yang et al. 2011). Later, these excised “exons” were identified as circular RNAs produced from backsplicing of exons of pre-mRNAs (Salzman et al. 2012; Zhang et al. 2014a); “excised introns” were characterized as circular intronic RNAs (ciRNAs) (Zhang et al. 2013) and snoRNA-ended lncRNAs (sno-lncRNAs) (Yin et al. 2012). Further, SPA (5′ snoRNA capped and 3′ polyadenylated) lncRNAs were uncovered using fibrillarin (a key protein component of Box C/D snoRNP complex)—RNA immunoprecipitation (RIP) and RNA-seq (Wu et al. 2016).

IDENTIFICATION OF DIFFERENT TYPES OF CIRCULAR RNAs FROM NON-POLY(A) TRANSCRIPTOMES

Covalently closed circular RNAs were first observed by electron microscopy in plants and eukaryotic cells in the 1970s (Sanger et al. 1976; Hsu and Cocaprados 1979). This was followed by observations of RNA circles with scrambled exons, which were thought to be “by-products” of aberrant splicing with little functional potential (Nigro et al. 1991; Cocquerelle et al. 1992; Capel et al. 1993).

Genome-wide studies of rRNA-depleted and non-poly(A) transcriptomes, as well as rRNA-depleted and RNase Renriched transcriptomes, together with specific bioinformatics algorithms for circular RNA detection (Kristensen et al. 2019), have revealed widespread expression of circular RNAs in metazoans (Yang et al. 2011; Gardner et al. 2012; Salzman et al. 2012, 2013; Jeck et al. 2013; Memczak et al. 2013; Zhang et al. 2013, 2014a; Guo et al. 2014; Westholm et al. 2014; Ivanov et al. 2015) and in plants (Wang et al. 2014; Lu et al. 2015). These include two types of circular RNAs derived from the primary Pol II transcripts.

The first type of ciRNAs derive from lariat introns in mammalian cells, and their formation depends on a consensus sequence containing a 7-nt GU-rich motif near the 5′ splice site and an 11-nt C-rich motif at the branchpoint site that inhibits debranching. The resulting RNA circles are covalently ligated through a 2′, 5′-phosphodiester bond at the joining site and lack the linear part stretching from the 3′ end of the intron to the branchpoint (Fig. 1A). Hundreds of ciRNAs were identified in human cells including the human embryonic carcinoma cell line PA1 and the human embryonic stem cell (hESC) H9 line. Some abundantly expressed ciRNAs including ci-ankrd52 and ci-sirt7 were found to primarily accumulate in the nucleus and interact with the elongating Pol II complex. Depleting these ciRNAs led to decreased transcription of the parental ANKRD52 or SIRT7 genes. These results suggest a cis-regulatory role of intronic noncoding sequences on their parent coding genes (Fig. 1A; Zhang et al. 2013). Stable intronic RNAs derived from lariats were also found in oocytes of Xenopus tropicalis (Gardner et al. 2012; Talhouarne and Gall 2014) and Drosophila melanogaster (Jia Ng et al. 2018).

Figure 1.
View larger version:
    Figure 1.

    Production, structure, and degradation of circular RNAs. (A) A model for circular intronic RNA (ciRNA) processing. ciRNAs are derived from lariat introns and are covalently ligated via a 2′,5′-phosphodiester bond. Their formation depends on a consensus sequence near the 5′ splice site and containing a 7-nt GU-rich motif near the 5′ splice site and an 11-nt C-rich motif at the branchpoint site. (B) A model for circular RNA (circRNA) processing. circRNAs are produced by spliceosome-catalyzed backsplicing of exon(s). Generally, flanking intronic complementary sequences (ICSs; most are Alu elements) promote exon circularization, whose efficiency can be regulated by competition between ICSs across introns. (C) Nascent circRNA production correlates with Pol II elongation rate. In cells, the efficiency of backsplicing from pre-mRNA is relatively low, but because of the stability, circRNAs can accumulate to achieve high abundance at the steady state levels. (D) A model of alternative circularization. Alternative formation of inverted repeated ICS pairs and the competition between the pairs can lead to multiple circRNAs produced from a single gene locus. (The red dashed arrows indicate potential competition between different ICS pairs.) (E) Linking circRNA processing and structure to innate immunity regulation. Under normal conditions (left), NF90/NF110 promotes circRNA production by stabilizing ICSs flanking intronic RNA pairs to juxtapose the circRNA-forming exon(s). Many examined circRNAs form intramolecular short imperfect double-stranded RNAs (dsRNAs) that act as inhibitors of innate immune dsRNA receptor PKR and NF90/NF110. Upon viral infection (right), circRNAs are globally reduced at the production level because of export of NF90/NF110, as well as at the steady state level because of the rapid turnover upon RNase L activation. This global reduction of circRNAs may free PKR and NF90/NF110 to be engaged in antiviral immune responses. Misregulation of this process is found in patients with autoimmune disease including systemic lupus erythematosus (SLE).

    Of note, different from higher eukaryotic cells, it has recently seen that excised introns in yeast can be stabilized as linear forms that can regulate cell growth (Morgan et al. 2019; Parenteau et al. 2019). Thirty-four excised linear introns were found in Saccharomyces cerevisiae that remain associated with components of the spliceosome. These differ from classical spliceosomal introns by containing a short distance between their lariat branchpoint and the 3′ splice site (Morgan et al. 2019). Such linear but stable introns were shown to promote resistance to starvation by enhancing the repression of ribosomal protein genes that are downstream from the nutrient-sensing TORC1 and protein kinase A (PKA) pathways (Morgan et al. 2019).

    A second type of circular RNAs (circRNAs) are produced from backsplicing of exons of pre-mRNAs in thousands of gene loci in eukaryotes. For these, exons from pre-mRNAs have their ends covalently joined via 3′, 5′-phosphodiester bonds. Although the expression level of most circRNAs is low, some circRNAs have been reported to accumulate to levels as high as or even higher than that of their linear cognate mRNAs (Salzman et al. 2013; Rybak-Wolf et al. 2015; You et al. 2015; Zhang et al. 2016b). Their biogenesis requires spliceosomal machinery and can be modulated by both cis- and trans-factors. At the molecular level, some abundant circRNAs were shown to modulate gene expression by titrating miRNAs (Hansen et al. 2013; Memczak et al. 2013; Piwecka et al. 2017) or proteins (Chen et al. 2017; Li et al. 2017; Xia et al. 2018; Liu et al. 2019), regulating transcription (Li et al. 2015; Conn et al. 2017), interfering with splicing (Ashwal-Fluss et al. 2014; Zhang et al. 2014a, 2016a), or even acting as templates for translation (Legnini et al. 2017; Pamudurti et al. 2017; Yang et al. 2017). However, because of their circular conformation and sequence overlap with linear mRNA counterparts, challenges exist at multiple levels, from annotations to functional studies, to understand the expression and functions of circRNAs (Li et al. 2018). In the past several years, we have focused on understanding the regulation of their life cycles, which led us to uncover important functions in innate immune responses (see below).

    CHARACTERIZATION OF circRNA BIOGENESIS

    During exon backsplicing, a downstream splice-donor site is covalently ligated to an upstream splice-acceptor site (Salzman et al. 2012; Jeck et al. 2013; Memczak et al. 2013; Salzman et al. 2013). How does the spliceosome overcome the sterically unfavorable reaction between backspliced exons? The detailed mechanism of circRNA biogenesis had remained to be explored, despite a noted association with Alu elements or other complementary sequences in flanking introns of circle-forming exons (Capel et al. 1993; Jeck et al. 2013). By developing a computational algorithm (CIRCexplorer) (Zhang et al. 2014a), we identified thousands of circRNAs in non-poly(A) data sets derived from H9 and HeLa cells [described as “excised exons” (Yang et al. 2011)]. Using circular RNA recapitulation assays that contain flanking intronic complementary sequences in circle-forming vectors, we have shown that exon circularization is in general promoted by flanking intronic complementary sequences (ICSs), most being Alu elements in human (Fig. 1B). The efficiency of exon circularization is regulated by competition between RNA pairing across flanking introns or within individual introns (Zhang et al. 2014a). Given that the great majority of circRNAs are derived from the middle exons of genes (Zhang et al. 2014a) and that backsplicing requires the canonical spliceosomal machinery (Ashwal-Fluss et al. 2014; Starke et al. 2015; Wang and Wang 2015; Liang et al. 2017), it has been proposed that backsplicing events compete with linear RNA splicing (Fig. 1B; Ashwal-Fluss et al. 2014; Zhang et al. 2014a). It would therefore be important to address the kinetics of backsplicing in cells.

    To understand how circRNA biogenesis is linked to transcription and canonical splicing, we studied circRNA processing using metabolic tagging of nascent RNAs via 4-thiouridine (4sU), followed by purification of labeled nascent RNAs and RNA-seq (4sUDRB-seq). We found that backsplicing from pre-mRNA is inefficient in cells (<1% of canonical splicing) and that many backsplicing events occur post-transcriptionally (Fig. 1C). However, circRNAs are stable and some can accumulate to high steady state levels (Fig. 1C). For example, in neurons that have slow division rates, we observed remarkably increased number and abundance of circRNAs upon human embryonic stem (ES) cells differentiation to forebrain neurons (Zhang et al. 2016b).

    Despite slow processing of circRNAs in cells, the potential alternative formation of inverted repeated ICS pairs and the competition between these pairs lead to the production of multiple circular RNAs from a single gene, a phenomenon that we termed as “alternative circularization” (Fig. 1D). Alternative circularization widely occurs; for example, >50% of highly expressed circRNA (mapped backsplice junction reads ≥ 0.1 RPM [reads per million mapped reads]) gene loci in multiple examined cell lines can produce more than one circRNA (Zhang et al. 2014a; Zhang et al. 2016a). We have annotated patterns of alternative circularization, including different types of alternative backsplicing and alternative splicing events, in circRNAs from a range of commonly used human cell lines (Zhang et al. 2016a). Compared to linear cognate RNAs, circRNAs exhibit distinct patterns of alternative backsplicing and alternative splicing. Quantification of RNA pairing capacity of orientation-opposite ICSs across circRNA-flanking introns using a complementary sequence index revealed that among all types of complementary sequences, short interspersed nuclear elements (SINEs)—especially Alu elements in human—contribute the most for circRNA formation, and that their diverse distribution across species results in the increased complexity of circRNA expression during species evolution (Dong et al. 2017). These findings together reveal the complexity of post-transcriptional regulation in mammalian transcriptomes.

    PRODUCTION, STRUCTURE, AND DEGRADATION OF circRNAs REGULATE INNATE IMMUNE RESPONSES

    Because of overlapping sequences, it has been challenging to study individual circRNA function due to inadequate methods in distinguishing exons in circRNAs from those in linear cognate mRNAs (Li et al. 2018). We speculated that understanding the life cycle difference between circular and linear RNAs might provide some insights into their functions.

    Although having the same cis-elements, expression levels of circRNAs from the same loci exhibit cell type and tissue-specific patterns, indicating the participation of protein factors in circRNA biogenesis. To identify such factors, we applied a genome-wide siRNA screening that targets all human unique genes with an in-house-developed efficient dual-color circular/linear RNA expression reporter. In this vector, backsplicing produces a translatable circular mCherry and canonical splicing produces a translatable linear egfp. This screening identified 103 proteins that have an impact on mCherry expression but not EGFP (Li et al. 2017). In addition to protein candidates that have known roles in splicing regulation, we identified multiple factors related to host immune responses involved in circRNA production (Li et al. 2017). One such factor is the human interleukin enhancer binding factor 3 (ILF3), whose gene produces multiple mRNA isoforms by alternative splicing to express factors including nuclear factor 90 (NF90) and nuclear factor 110 (NF110). iCLIP and mutagenesis experiments confirmed that NF90/NF110 directly bind to inverted-repeated Alu elements juxtaposing circRNA-forming exon(s) to promote circRNA production by stabilizing intronic RNA pairs in the nucleus (Fig. 1E). Interestingly, mature circRNAs as a group were found to associate with NF90/NF110 in the cytoplasm (Li et al. 2017). It is known that upon viral infection, NF90/NF110 are rapidly exported to the cytoplasm, where they participate in innate immunity (Harashima et al. 2010). After infection, we observed a global reduction of nascent circRNA production, which was consistent with a deassociation of NF90/NF110 from circRNA–protein complexes for antiviral activity. Consistently, overexpression of endogenous circRNAs facilitated viral infection of human cells (Li et al. 2017). These findings indicated that circRNA biogenesis is unfavorable for innate immune responses, thus linking endogenous circRNA production to innate immunity regulation (Fig. 1E; Li et al. 2017).

    Remarkably, not only is nascent circRNA production limited globally, but the steady state level of circRNAs is also globally and rapidly (with a turnover of ∼1 h) reduced upon poly(I:C) treatment to mimic pathogenic dsRNAs or the encephalomyocarditis virus (EMCV) infection (Liu et al. 2019). This global reduction of steady state circRNAs is catalyzed by RNase L, an endonuclease that becomes activated upon the activation of innate immune response to cleave viral and host mRNAs as one way to limit viral spread (Han et al. 2014; Huang et al. 2014). Structural analysis by an optimized in cell SHAPE-MaP assay (Smola et al. 2015) revealed that endogenous circRNAs tend to form 16- to 26-bp imperfect RNA duplexes and act as inhibitors of a group of nucleic acids receptors with antiviral activities including the IFN-inducible isoform of adenosine deaminase acting on RNA 1 (ADAR1), ADAR1p150, NF90/NF110, and double-stranded RNA (dsRNA)-activated protein kinase (PKR) (Liu et al. 2019). Compared to other examined proteins, circRNAs exhibit the highest binding preference for PKR and regulate PKR activation. PKR undergoes autophosphorylation and activation by long dsRNAs (>33-bp), but this activation is blocked by short dsRNAs (16- to 33-bp) (Zheng and Bevilacqua 2004). circRNAs, but not their linear cognate mRNAs, are inhibitors of the dsRNA-induced activation of PKR in a sequence-independent, but dsRNA structuredependent manner. Depleting RNase L in cells resulted in delayed PKR activation, whereas introducing dsRNA-containing circRNAs into cells rendered them susceptible to EMCV infection (Liu et al. 2019). This regulation is physiologically important, and misregulation has been observed to be related to the autoimmune disease systemic lupus erythematosus (SLE). For example, augmented PKR phosphorylation and circRNA reduction were found in peripheral blood mononuclear cells (PBMCs) derived from patients of SLE. Importantly, introducing the dsRNA-containing circRNA, but not their linear cognate mRNAs, into PBMCs or T cells derived from SLE patients attenuated the aberrant PKR activation cascade.

    Collectively, by studying processing, structure and degradation of circRNAs, we have discovered that endogenous circRNAs as a group can dampen innate immune responses (Fig. 1E; Li et al. 2017; Liu et al. 2019). These findings are consistent with a recent report that patients with a genetic defect leading to the accumulation of intron lariatderived RNA circles are more susceptible to viral infections (Zhang et al. 2018). Different from endogenous circRNAs, there are conflicting reports on whether exogenously produced circular RNAs themselves trigger immune responses (Chen et al. 2017; Chen et al. 2019; Wesselhoeft et al. 2019). Future studies are warranted to clarify the modes of action of in vitrosynthesized versus endogenous circular RNAs in the regulation and application in the innate immunity.

    IDENTIFICATION OF snoRNA-ENDED LONG NONCODING RNAs

    Although it is generally believed that most introns or intron fragments are unstable (Rodriguez-Trelles et al. 2006), the RNA-seq identified a number of nonannotated non-poly(A) RNA signals that mapped to intronic regions in hES H9 and HeLa cells (Yang et al. 2011). Intriguingly, careful analysis of these nonpolyadenylated reads revealed that some excised introns are stabilized by small nucleolar ribonucleoprotein complexes (snoRNPs) at each end, which were named as sno-lncRNAs (snoRNA ended long noncoding RNAs) (Fig. 2A). snoRNAs are a family of conserved nuclear RNAs (70–200 nt) that function in the modification of small nuclear RNAs (snRNAs) or processing of ribosomal RNAs (Kiss 2001). Binding of core proteins cotranscriptionally is essential to protect the termini of mature snoRNAs from exonucleolytic degradation (Samarsky et al. 1998; Kufel et al. 2000). In many higher eukaryotes, the great majority of snoRNAs are produced from introns, and usually one intron only contains one snoRNA (Weinstein and Steitz 1999; Dieci et al. 2009). When one intron contains two snoRNA genes, the sequences between the snoRNAs are not degraded after splicing, leading to the accumulation of lncRNAs flanked by snoRNA sequences but lacking 5′ caps and 3′ poly(A) tails (Fig. 2A; Yin et al. 2012). Dozens of sno-lncRNAs have been found in mammalian genomes, whose expression is species-specific, because of species-specific alternative splicing that results in single snoRNA or two snoRNAs in one intron (Zhang et al. 2014b; Xing and Chen 2018).

    Further application of fibrillarinRIP and RNA-seq identified 5′-end snoRNP-capped, 3′ polyadenylated (SPA) lncRNAs. SPA processing is associated with kinetic competition of the 5′ to 3′ exonuclease XRN2 and Pol II elongation speed downstream from polyadenylation signals in the nucleus. Following cleavage/polyadenylation of an upstream gene, the downstream uncapped pre-SPA RNA is trimmed by XRN2 until this exonuclease reaches the cotranscriptionally assembled snoRNP complex (Fig. 2B). More recent study has revealed that the 5′ to 3′ exonuclease activity of the cytoplasmic nonsense-mediated decay pathway mediated by SMG6-mediated endonucleolytic cleavage can also trigger the production of a box C/D snoRD86-containing SPAs in the cytoplasm (c-SPAs) (Lykke-Andersen et al. 2018).

    INVOLVEMENT OF snoRNA-ENDED lncRNAs IN PRADER–WILLI SYNDROME

    Among the snoRNA-ended lncRNAs, we found that five sno-lncRNAs (sizes are 1000—3000 nt, named sno-lncRNA1 to sno-lncRNA5) and two SPAs (35,000 nt for SPA1 and 16,000 nt for SPA2) are conspicuously missing in Prader–Willi syndrome, a neurodevelopmental genetic disorder with elusive molecular causes (Cassidy et al. 2012; Yin et al. 2012; Chamberlain 2013; Wu et al. 2016; Aman et al. 2018). In normal hESCs, these abundant PWS region snoRNA-ended lncRNAs accumulate in cis to form one 1- to ∼2-µm3 RNA–protein puncta per nucleus. Importantly, these lncRNAs interact with different RNA binding proteins (RBPs) including TAR DNA-binding protein 43 (TDP43), heterogeneous nuclear ribonucleoprotein M (hnRNPM), and RBP fox-1 homolog 2 (RBFOX2) in normal hESCs and other examined human cell lines. Super resolution microscopy and iCLIP further showed strong preferences between different sno-ended lncRNAs and their interacting RBPs: SPA1 preferred to interact with TDP43, sno-lncRNAs preferred to bind RBFOX2, whereas SPA2 did not have a preference to these examined RBPs. Generation of a PWS cellular model in hESCs by depleting these lncRNAs using CRISPRCas9 revealed the mislocalization of their associated RBPs and an altered pattern of alternative splicing of more than 300 mRNAs without affecting the expression level of individual corresponding mRNAs (Fig. 2C). Importantly, some genes with altered splicing regulation are associated with synaptosome and neurotrophin signaling pathways (Yin et al. 2012; Wu et al. 2016). These studies together indicate that missing of these snoRNA-related lncRNAs is linked to PWS pathogenesis (Fig. 2C).

    Figure 2.
    View larger version:
      Figure 2.

      The biogenesis and diversity of snoRNA-ended lncRNAs. (A) A model for snoRNA-ended long noncoding RNA (sno-lncRNA) processing. Typically, when one intron contains two snoRNA genes, the cotranscriptional formation of snoRNPs at the ends can protect the intronic sequences from exonuclease trimming after debranching during splicing, leading to the formation of sno-lncRNAs. (B) A model for 5′-end snoRNP-capped, 3′-polyadenylated (SPA) lncRNA processing. Like that of sno-lncRNA, SPA processing requires an intact snoRNA at the 5′ end. A weak poly(A) signal located downstream from the coding region of mRNA is also necessary for generating readthrough transcripts of pre-SPAs. (C) snoRNA-ended lncRNAs are derived from the critical region deleted in PWS patients. (Top) sno-lncRNAs and SPAs derived from the PWS deletion region (human 15q11-q13) interact with multiple RBPs including TDP43, RBFOX2, and hnRNP M in the nucleus of human ES cells. (Bottom) Minimal chromosome deletions reported in four PWS individuals (Cases 1–4) (Sahoo et al. 2008; de Smith et al. 2009; Duker et al. 2010; Bieth et al. 2015). (D) Subtypes of snoRNA-ended lncRNAs. sno-lncRNAs, and SPAs can be ended with either box C/D or box H/ACA snoRNAs at their ends.

      Interestingly, the lncRNAs 116HG from the Prader–Willi locus in mouse also form similar cloud-like nuclear accumulations (Powell et al. 2013; Coulson et al. 2018), but currently existing PWS mouse models do not fully recapitulate the PWS patient phenotypes (Ding et al. 2008), highlighting the importance of developing human PWS cellular models in studying the pathogenesis of PWS in the future.

      Future work is warranted to examine whether hESCs lacking all these lncRNAs would have any measurable effect on neuronal functions, especially hypothalamic neurons, including not only the morphological phenotypes but also alterations in transcriptomes and alternative splicing patterns. In addition, the contributions of sno-lncRNAs and SPAs to the molecular and disease phenotype at this locus should also be pursued by identifying additional proteins, DNAs and RNAs that are associated with individual sno-lncRNAs and SPAs in hESCs and differentiated neurons. Nevertheless, these findings expand the diversity of lncRNAs and provide previously unappreciated insights into PWS pathogenesis.

      REGULATION OF RNA POLYMERASE I TRANSCRIPTION BY SLERT IN THE HUMAN NUCLEOLUS

      There are two main classes of snoRNAs: box C/D snoRNAs and box H/ACA snoRNAs in eukaryotes according to their conserved sequence motifs (Reichow et al. 2007). Thus, four different subtypes of sno-lncRNAs that each contain the same or different box C/D or box H/ACA snoRNPs at the ends might be expected to exist (Fig. 2D; Zhang et al. 2014b). In addition to the PWS region sno-lncRNAs that are capped by a box C/D snoRNA at each end (Yin et al. 2012), SLERT (snoRNA-ended lncRNA enhances pre-ribosomal RNA transcription) is a sno-lncRNA that contains box H/ACA snoRNAs at its ends and is highly expressed in multiple human cell lines (Fig. 3A; Xing et al. 2017). Of note, sno-lncRNAs containing a box C/D or a box H/ACA snoRNP at each end were also observed in human cells (Zhang et al. 2014b). Similarly, in addition to the 5′ box C/D snoRNA-ended PWS region SPAs and c-SPAs (Wu et al. 2016; Lykke-Andersen et al. 2018), box H/ACA snoRNA-ended SPAs were also detected in several human cell lines (Luo, We, Chen, et al., unpubl. data) (Fig. 2D).

      SLERT is generated from the intron of the transforming growth factor beta regulator 4 (TBRG4) gene locus and alters RNA polymerase I (Pol I) transcription of ribosomal RNAs (rRNAs) (Xing et al. 2017). Alternative splicing leading to skipping of exons 4 and 5 of the TBRG4 locus results in two box H/ACA snoRNAs embedded within one intron, which subsequently forms SLERT (Fig. 3A). Unlike PWS region sno-lncRNAs, SLERT does not localize to its own transcription site but instead accumulates in the nucleolus, which is the largest nuclear subdomain in which rRNA biogenesis takes place. Translocation of SLERT from its transcription site to the nucleolus depends on its two box H/ACA snoRNAs (Fig. 3B).

      Figure 3.
      View larger version:
        Figure 3.

        The biogenesis and function of SLERT. (A) Alternative splicing of the TBRG4 locus. Alternative splicing of tbrg4 pre-mRNA generates either two snoRNAs and the tbrg4 mRNA (top) or a snoRNA-ended lncRNA that enhances preribosomal RNA transcription (SLERT) and a short isoform of tbrg4 (rapidly degraded) with skipped exons 4 and 5. (B) SLERT requires its box H/ACA snoRNA ends to be translocated to the nucleolus. Each indicated SLERT or SLERT mutants were expressed in HeLa cells, followed by co-staining of WT-SLERT, WT-SLERTMUT, or egfp-SLERT (green) and nucleolar marker nucleolin (red). (C) The Pol I complexes are located within DDX21 rings, shown by SIM in PA1 cells. (D) SLERT interacts with DDX21 rings but not Pol I complexes. A representative image of co-staining DDX21 (green), RPA194 (red), and SLERT (blue) by SIM and a plot profile of the image are shown. (E) SLERT modulates DDX21 in Pol I transcription regulation. In the nucleolus, DDX21 ring-shaped arrangement surrounds multiple Pol I complexes and inhibits Pol I activity. Binding by SLERT allosterically alters individual DDX21 molecules, leading to the reduced interaction between DDX21 and Pol I, which subsequently allows the Pol I complexes to occupy the actively transcribed rDNAs. (AE, Adapted and modified from Xing et al. 2017.)

        Once localized in the nucleolus, SLERT directly interacts with the DEAD-box family protein 21 (DDX21) via a 143-nt-long internal region within the two snoRNAs. DDX21 is a DEAD-box RNA helicase that is involved in multiple steps of ribosome biogenesis by contacting both rRNA and snoRNAs and is thought to modulate rRNA transcription, processing, and modification (Holmstrom et al. 2008; Calo et al. 2015; Sloan et al. 2015). Applying super-resolution structured illumination microscopy (SIM) to examine DDX21 localization in live and fixed human cells in detail, we found that DDX21 is largely enriched in the nucleolus and form dozens of ring-like structures encircling Pol I complexes (Fig. 3C). Further biochemical analyses showed that DDX21 interacts with subunits of Pol I complexes and prevents them loading onto rDNAs in an RNA helicase activityindependent manner and subsequently inhibits pre-rRNA transcription.

        Depleting one snoRNA of SLERT by CRSIPRCas9 abolished SLERT expression in examined human cells and led to suppressed Pol I transcription. Under SIM observation, SLERT does not directly interact with Pol I, but instead specifically localizes to individual DDX21 rings (Fig. 3D). Mechanistically, SLERT binding to DDX21 alters the conformation of individual DDX21 molecules, suppressing the interaction between DDX21 and Pol I, therefore releasing Pol I to be engaged to rDNAs for active pre-RNA transcription (Fig. 3E).

        ULTRA-STRUCTURE ORGANIZATION OF THE HUMAN NUCLEOLUS

        The primary function of the nucleolus serves as the site of rRNA biogenesis. It has been well-established that the mammalian nucleolus is highly organized and is comprised of three morphologically distinct subregions and named fibrillar centers (FCs), dense fibrillar components (DFCs), and the granular component (GC), shown by the electron density under electron microscopy (EM) (Fig. 4A; Koberna et al. 2002; Boisvert et al. 2007). Recent single-molecule images have further revealed the tripartite nucleolar organization (Szczurek et al. 2016; Khan et al. 2018). Continuous Pol I transcription at the FC produces nascent pre-rRNAs and causes the subsequent radial flux of pre-rRNAs through the DFC for processing and modification to produce 28S and 18S rRNAs, and then into the GC for ribosome assembly and finally into the nucleoplasm (Boisvert et al. 2007; McStay and Grummt 2008). A human nucleolus is assembled around active nucleolar organizer regions (NORs), which are composed of clusters of tandem repeats of ribosomal DNA (rDNA), each with a long intergenic spacer (IGS) of ∼30 kb and a pre-ribosomal RNA (pre-rRNA) coding region of ∼14 kb (McStay and Grummt 2008).

        Figure 4.
        View larger version:
          Figure 4.

          Ultra-structure organization of the human nucleolus and nascent pre-rRNA translocation is required for proper pre-rRNA processing and function. (A) Electron microscopy shows the localization of nucleolus in thin-sectioned permeabilized HeLa cells. (B) Representative SIM images of nucleoli and three nucleolar subregions in live HeLa cells. (C) Cross-correlation of aligned and averaged images shows that the max-cross sections of individual DFC regions contain six FBL clusters. The minimum distance between two clusters is ∼180 nm; the diameter of FBL cluster is ∼133 nm; the distance between the center of individual clusters and the center of DFC is ∼247.5 nm; the detached distance between two adjacent PF clusters is ∼180 nm. (D) A schematic of RNA smFISH probes to detect transcribing pre-rRNA. The 5′ ETS-1 probe detects nt 1–414 of pre-rRNAs; the 5′ ETS-2 probe detects nt 498–977 of pre-rRNA. (E) The 5′ ETS-1 probe-detects 47S pre-rRNAs that are largely distributed outside of the FC, whereas the 5′ ETS-2 probe detects pre-rRNAs that are mainly located at the FC/DFC border, shown by SIM. (F) FBL knockdown (KD) results in impaired localization of nascent pre-rRNAs shown by the highest colocalization signal between RPA194 and 5′ ETS-1. (G) FBL KD results in aberrant accumulation of 47S and 34S pre-rRNAs, accompanied by reduced 28S and 18S rRNAs. (H) Cy3-labeled 5′ ETS-1 (magenta) is sorted to the mNeonGreen-FBL-FL droplets (green) in vitro. (I) The IDR length in the GAR domain of FBL promotes FBL self-association, which confers the capability of pre-rRNA sorting and processing in which FBL is involved. (Left) Increased GAR domain length in FBL mutants leads to augmented FBL self-association shown by increased multimerization. (Middle) Increased IDR length in the GAR domain of FBL mutants promotes translocation of pre-rRNAs into the DFC region. Twenty cells were analyzed under each condition by boxplot. (Right) Increased IDR length in the GAR domain of FBL positively correlates with proper 47S pre-rRNA processing, shown by northern blots. (A, Adapted from Koberna et al. 2002; BI, adapted and modified from Yao et al. 2019b.)

          The SIM observation that DDX21 forms dozens of ring-like structures surrounding Pol I complexes (Fig. 3C) raised an intriguing question—how are DDX21-rings positioned in the tripartite organization of the mammalian nucleolus? Other questions remaining to be addressed include where does Pol I transcription occur (Mais and Scheer 2001; Cheutin et al. 2002; Huang 2002; Boisvert et al. 2007) and how do nascent pre-rRNAs migrate in the nucleolus? A time-dependent migration of Br-U-labeled pre-rRNAs from FCs to DFCs was observed in HeLa cells (Thiry et al. 2000), but the underlying mechanism of this observation had remained unclear.

          We applied CRISPRCas9-mediated knock-in of fluorescently tagged proteins to visualize FC, DFC, and GC subnucleolar organization under SIM that allowed us to uncover previously uncharacterized nucleolar ultra-structures (Fig. 4B; Yao et al. 2019b). A human nucleolus consists of dozens of FC/DFC units that are assembled around two to three copies of active rDNAs at the border of each FC/DFC, where Pol I complexes are located and Pol I transcription occurs. Pre-rRNA processing factors, such as fibrillarin (FBL) form 18 to 24 clusters that are further assembled into a polyhedron-like shell of the DFC surrounding FC. On average, each spherical PF cluster is ∼133 nm in diameter and each DFC region contains ∼628 nm outer and ∼362 nm inner diameters (Fig. 4C). Consistent with this model, Pol I complex subunits are enriched at the FC border as clusters (Yao et al. 2019b), which are in striking contrast to the previous models in which Pol I complexes were thought to be distributed throughout the entire FC region (Cheutin et al. 2002). Further, an active NOR contains not only active rDNAs but also transcriptionally inert rDNAs, consistent with the earlier observation of discontinuous transcribed rDNA clusters in rDNA spreads (McKnight and Miller 1976; McStay and Grummt 2008). Collectively, these observations represent a substantial advance in our understanding of nucleolar spatial organization.

          NASCENT pre-rRNA TRANSLOCATION IS REQUIRED FOR PROPER pre-rRNA PROCESSING AND FUNCTION

          An increasing number of RNAs have emerged as important modulators of nuclear structure and function by acting in cis or in trans (Bergmann and Spector 2014; Chen 2016; Engreitz et al. 2016; Tomita et al. 2017; Yao et al. 2019a). What mechanisms does the cell use to keep nascent RNAs from sticking together while also promoting directional sorting in trans? It has been a challenge to address this question in single cells because of the low abundance of most nascent RNAs that usually undergo rapid processing. Remarkably, we observed that the 5′ termini of the nascent 47S pre-rRNAs are translocated to the DFCs, whereas pre-rRNAs are still being transcribed at the border of FC and DFC (Fig. 4D,E), indicating that the relatively high abundance of nascent 47S pre-rRNA and its radial flux mode of processing in FC/DFCs can be an attractive model to study how nascent RNA directional sorting is achieved.

          Earlier studies in yeast revealed that the eukaryotic pre-rRNA processing is complex. The early step was thought to be involved in the assembly of small subunit (SSU) processomes including UTPa, UTPb, and U3 snoRNP (Barandun et al. 2018). UTPa complexes were reported to bind first to nascent 35S pre-rRNA to chaperone pre-rRNA and U3 snoRNA to initiate SSU assembly (Hunziker et al. 2016). Such an initial binding between UTPa and the 5′ ETS appeared to be required for the subsequent recruitment of the UTPb complex and the U3 snoRNP components (Henras et al. 2015; Sharma and Lafontaine 2015; Barandun et al. 2018). These studies in yeast might not directly reflect processing of nascent pre-rRNA in the human nucleolus since yeast cells have bipartite nucleoli containing merged FC/DFCs and GC (Thiry and Lafontaine 2005), which is distinct from tripartite nucleoli in human cells, where pre-rRNA processing takes place in DFCs.

          Use of shRNA-mediated knockdown of factors in UTPa/b and snoRNP complexes combined with structured illumination microscopy (SIM)-based screening have revealed that FBL plays a key role in promoting the movement of the 5′ end of 47S pre-rRNA from its transcription site to DFC. Depletion of FBL dramatically blocks the translocation of 5′ ends of transcribing pre-rRNAs and results in the accumulation of pre-rRNAs at the transcription sites at the border of FC/DFCs (Fig. 4F), together with aberrant pre-rRNA processing shown as 34S pre-rRNA accumulation (Fig. 4G).

          FBL is well-known as an rRNA 2′-O-methyltransferase in the snoRNP particles that participate in pre-rRNA processing (Tollervey et al. 1991; Tollervey et al. 1993; Tafforeau et al. 2013) and Pol I transcription (Tessarz et al. 2014; Loza-Muller et al. 2015). Its amino-terminus contains a glycine- and arginine-rich (GAR) domain highly enriched with intrinsically disordered regions (IDRs) that are required for Xenopus laevis FBL phase separation (Feric et al. 2016); its carboxyl terminus contains an RNA binding domain (RBD) and the α domain for methyltransferase activity (MD domain).

          How does FBL control the movement of the 5′ end of 47S pre-rRNA to the DFC? First, distinct from other components in snoRNP complexes, FBL specifically and strongly interacts with the 5′ end of 47S pre-rRNA as revealed by PAR-CLIP (Kishore et al. 2013) and by in vitro binding and phase-separation assays (Yao et al. 2019b). These observations indicate an additional function of FBL in nascent pre-rRNA sorting beyond its classical role in U3 snoRNP. Second, the GAR domain is necessary and sufficient for human FBL self-aggregation into droplets at physiological concentrations (Yao et al. 2019b). Importantly, both in vitro and in vivo lines of evidence suggest a model whereby the binding and sorting of nascent 47S pre-rRNA utilizes different FBL domains: The MD domain binds to the 5′ end of 47S pre-rRNA, and the RNA-interacting FBL then moves toward the DFC by GAR domain self-association (Fig. 4H–I). Such pre-rRNA sorting strongly correlates with FBL self-association via the strength of IDRs in FBL and is required for proper pre-rRNA processing in cells (Fig. 4I). For example, rescue of FBL KD in cells with FBL mutants containing different length of GARs but with intact MD domain shows the enhanced formation of FBL polymers, enhanced 5′-end pre-rRNA sorting to DFC, and increased proper production of 47S pre-rRNA (Fig. 4I). Importantly, this sorting and translocation process is required for proper ribosome production, thus representing yet another example illustrating the importance of linking RNA processing to function.

          More broadly, as many RBPs associated with nascent pre-mRNA processing events are known to contain structurally disordered regions (Banani et al. 2017; Gueroussov et al. 2017; Shin and Brangwynne 2017; Ying et al. 2017), a similar liquid–liquid phase-separation controlled nascent rRNA sorting mechanism is likely used by the cell to keep other types of nascent RNAs from unnecessary or unwanted self-aggregation.

          CONCLUSION

          It has been well-established that Pol II transcription, nascent pre-mRNA splicing, capping, polyadenylation, mRNA export, and surveillance are seamlessly integrated (Moore and Proudfoot 2009). Studies of unconventionally processed lncRNAs including circular RNAs (Fig. 1), snoRNA-related RNAs (Figs. 2 and 3), and the “housekeeping” pre-rRNAs (Fig. 4) have revealed no exception for ncRNAs: Throughout the maturation process of each district class of RNA, transcription and processing are crucially important for subcellular localization and function.

          One may argue that all examples illustrated so far are unconventionally processed ncRNAs. Indeed, despite these examples, it has been well-studied that Pol II transcription and the 3′-end alternative processing of the NEAT1 lncRNA act together to modulate paraspeckle morphology and function (Mao et al. 2011; Naganuma et al. 2012; Hirose et al. 2014; Wang et al. 2018; Yamazaki et al. 2018). Emerging studies have also revealed both cis- and trans-factors are required for the subcellular localization of Pol IItranscribed mRNA-like lincRNAs and functions (Hacisuleyman et al. 2014; Zhang et al. 2014b; Lubelsky and Ulitsky 2018; Shukla et al. 2018).

          Recent studies using the most robustly available methods have greatly advanced our understanding of cellular functions of lncRNAs (Kopp and Mendell 2018; Yao et al. 2019a). Beyond linking RNA processing and function, how lncRNAs are structured and what structural conformations lncRNAs adopt for their interacting partners have remained mysteries. Their large sizes and flexible nature have endowed them with previously underappreciated functional potentials; however, this has also presented unexpected experimental challenges to understanding their regulation at multiple levels. Future studies aimed at understanding in greater detail the regulation of expression, subcellular localization patterns, interaction partners, and conformational information of lncRNAs, as well as understanding how the biogenesis and turnover of RNAs are linked to their subcellular localization, will provide greater insight into their cellular functions.

          This article is distributed under the terms of the Creative Commons Attribution-NonCommercial License, which permits reuse and redistribution, except for commercial purposes, provided that the original author and source are credited.

          REFERENCES

          | Table of Contents