Chapter III RNA sequencing

Source: Internet
Author: User

RNA sequencing (RNA sequencing, referred to as Rna-seq, also known as the whole transcription group Bird marksmanship sequencing Whole transcriptome shotgun sequencing, abbreviated wtss), is a method of studying transcriptome based on second-generation sequencing techniques that can quickly obtain the type and number of RNA in a genome at a given moment.

    • Rna-seq helps to see different transcripts of genes, post-transcriptional modifications, gene fusions, mutations/SNP and gene expression changes over time, or differences in gene expression in different groups.

    • In addition to viewing mRNA transcripts, Rna-seq can also view total RNA, small RNA such as miRNA, tRNA, and ribosomal RNA.

    • The RNA-SEQ can be used to determine exon/intron boundaries and to validate or remediate annotated 5 ' and 3 ' gene boundaries.

    • Recent studies in RNA-SEQ include single-cell sequencing and in-situ sequencing of stationary tissues.

Library Preparation

RNA cDNA Library Preparation usually consists of the following steps: RNA extraction and isolation, RNA type selection and digestion, cDNA synthesis. However, different platforms may vary.

Analytical transcription of the Assembly

There are two strategies for using sequencing data for transcription of this assembly:

    1. Start-up: This method does not require a reference genome to reorganize the transcriptome and is often used when the genome is unknown, incomplete, or very different. The challenges of using sequencing read to assemble from scratch include: 1 when encountering overlapping groups, determine which fragments should be connected together to become continuous sequences, 2) sequencing errors or the instability of manual errors, 3) computational efficiency. The main algorithm for assembling from scratch is converted from overlapping diagrams to de Bruijn diagrams, and the assembler using the de Bruijn diagram has velvet, Trinity, oases, and Bridger. The evaluation of the quality of the head assembly includes the median overlapping group length, the number of overlapping groups, and the N50.
    2. based on the reference genome assembly : This method relies on the sequence alignment algorithm, and the comparison of read covers the reference genome, resulting in a discontinuous portion, these discontinuous read is the result of the mature mRNA sequencing (see figure). In general, there are two steps to a comparison algorithm: 1) using a short sequence of read, 2) Finding the best alignment with dynamic programming, sometimes combining known annotations. Software tools based on genomic alignment include Bowtie, Tophat (based on the bowtie alignment of the results to cut points), Subread, STAR, Sailfish, Kallisto, and Gmap. The evaluation of the quality indicators based on the reference genome Assembly is mainly two points: 1) from the beginning of the assembly of indicators (for example, N50); 2) compared with known transcripts, shear points, genomes and protein sequences.

Regarding the quality of the Assembly, the current situation is: 1. The quality of the equipment varies according to the standards used; 2) software that scores well in one species does not necessarily perform well in other species; 3) The combination of different software may be the most reliable.

Gene expression

Quantitative expressions are often used to study the differences in cell changes, health and disease status, and other research issues that respond to external stimuli. Gene expression is often used to reflect protein abundance, but it does not apply to events such as RNA interference and nonsense-mediated transcription decay.

The statistical Rna-seq-read method has been effectively validated by the use of older techniques (expression microarray and qPCR) by counting the number of read volumes mapped to each gene block in the transcriptome Assembly steps to quantify the expression, using overlapping groups or annotated transcripts to quantify exon or gene expression. The quantization counting tool has htseq, featurecounts, Rcount, Maxcounts, Fixseq, and Cuffquant, all of which convert statistical read numbers into indicators for hypothesis testing, regression, and other analysis. This swap parameter is:

Library Size: Although the sequencing depth is predetermined in multiple rna-seq experiments, there is still a big difference between experiments. As a result, read statistics are typically converted to read, fragment, or CPM (FPM, RPM, or CPM) per million ratio read to adjust the total read (library size) generated in a single experiment.

Gene Length: If the transcript expression is the same, the longer gene will have more fragments, read or number of shorter genes. By dividing fpm by the length of the gene, the transcription of each thousand bases per million-pair read fragment (FPKM) is obtained. When looking at the genomes between samples, convert each fpkm to a one out of 10,000 transcript (TPM) by dividing each one by the sum of the FPKM in the sample.

Sample total RNA: Because the same amount of RNA is extracted from each sample, the total sample RNA will have fewer kinds of gene RNA, leading to false positives in downstream analysis.

The expression variance of each gene: modeling to consider sampling errors (important for genes with low read numbers), the variance can be estimated as normal, Poisson, or minus two distributions.

Differential expression and absolute quantification of transcripts

Rna-seq are often used to compare gene expression between different conditions, such as medication versus untreated, and to find out which genes are up or down in each case. In theory, RNA-SEQ can count the number of transcripts in each cell, count the read of each gene by sequencing the Read Statistics tool, and compare the samples to identify the genes of different expressions. There are many packages available for this type of analysis, and the tools commonly used are from the Bioconductor package Deseq and Edger, both of which use models based on a negative two-item distribution.

The general RNA-SEQ analysis cannot be absolutely quantified, as it only provides RNA levels relative to all transcripts, and if the total amount of RNA in the cells changes with different conditions, the relative normalization will falsely represent the changes in the individual transcript. An absolute quantification of mRNA can be performed by adding rna-seq (RNA samples of known concentrations).

Analysis of Gene co-expression network

Gene co-expression network analysis is based on the dynamic change of gene expression, calculates the co-expression relationship between genes, to establish a gene transcription regulatory model, to obtain the relationship between gene expression regulation and control direction, so as to find one or more species at different stages of development, or different tissues under different conditions or treatment of all gene expression regulatory network model and key genes.

Single nucleotide mutation (SNP) found

RNA-SEQ is limited to the detection of sequence variations in exon regions, and it is not possible to detect the sequence variation of intron regions. Although there is some correlation between exon and intron mutation, only whole genome sequencing can capture SNPs from all sources.

The absolute determination of individual mutations is to compare the transcription sequence with the DNA of the species. This distinguishes the skewed expression of the homozygous gene from one of the alleles and can also provide information about the genes that are not expressed in the Transcriptome experiment. An R language named Cummerbund package can be used to generate visualizations of expression charts.

RNA editing (post-transcription changes)

Comparing individual genomes and transcriptome sequences can also help detect post-transcription editing, and if the gene is homozygous, but the gene has a different transcript, it is determined to be post-transcriptional.

Fusion Gene Detection

Because of the different structural modifications in the genome, the fusion gene is concerned by its relationship with cancer. RNA-SEQ's ability to analyze the entire transcriptome for non-discrimination makes it a powerful tool for discovering these common events in cancer research.

The method follows the process of the short transcriptome read comparison to the reference genome, the majority of short read will be compared to a complete exon, there is still a large portion of the explicit sub-exon connection to the known exon, and then further analysis of the remaining not necessarily to read whether match to Exon--exon connections from different genes , which may be a powerful proof of fusion events, however, because of the length of the read, it is actually a coarser approach. Another approach is to use double-ended read, which can be better verified when a potentially large number of double-ended read will be compared to each other at each end of the display (see figure).

Application
    • Transcription structure studies (gene boundary identification, variable shear studies, etc.)

    • Transcription variation studies (e.g. gene fusion, coding area SNP studies)

    • Non-coding Area function study (non-coding RNA, microRNA precursor research, etc.)

    • Study on gene expression level and new transcription findings

Resources

Rna-seq

Chapter III RNA sequencing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.