About Transcriptome compared to star software use

Source: Internet
Author: User

Reference article: http://weibo.com/p/23041883f77c940102vbkd?sudaref=passport.weibo.com

Software connections: https://github.com/alexdobin/STAR/

The accurate alignment of high-throughput rna-seq data is a challenging and unresolved problem because of the discontinuous transcription structure, the relatively short fragment length, and the continuously increasing flux of sequencing techniques. The currently available rna-seq are subject to high ratio error rates, low ratio to speed, fragment length limits, and alignment deviations. Results: In order to compare our mass (> 80 billion fragment) encode transcriptome rna-seq datasets, we developed a star (spliced transcripts alignments to a) based on a previously not described RNA-SEQ algorithm Reference,star) software, the algorithm uses the continuous maximum comparable seed search in the uncompressed suffix array, followed by the seed clustering and stitching process. star is more than 50 times faster than other pairs on a common 12-core server, with an hourly comparison of 550 million 2 x BP double-ended fragments to the human genome, while improving sensitivity and accuracy. In addition to the non-biased detection of typical splicing, star was able to detect atypical splicing and chimeric (fused) transcripts, and was able to compare full-length RNA sequences. Using the 454 sequencing of the RT-PCR amplification, we experimentally validated 1960 new inter-gene splicing points with 80-90% accuracy, confirming the high accuracy of the star ratio strategy. Availability and implementation: Star is implemented as a stand-alone C + + code . Star is a free open source software released under the GPLv3 license.

1:star analysis can be divided into two steps: The first is genomegenerate (similar to the index of Tophat), and the second is: sequence alignment

2: About the first step genomegenerate run once, you can:

STAR--runmode genomegenerate--runthreadn--genomefastafiles/home/share/genome/homo_sapiens/ucsc/hg19/sequence/ wholegenomefasta/genome.fa--sjdbgtffile/home/share/genome/homo_sapiens/ucsc/hg19/annotation/genes/ Genes.gtf--sjdboverhang 89

-runmode: Run program mode, default is the comparison, so the first step of this parameter setting is critical

-RUNTHREADN: number of threads running

-genomedir: This parameter is important to store your claim index file path, you need to establish a read-write permission in advance folder

-genomefastafiles: genome fasta format file

-sjdbgtffile : GTF Comment File

-sjdboverhang: This value is 1 of the length of your sequenced read, which is the maximum length value used when annotating a variable shear sequence

5: Run the comparison

Not only can the star be compared, it can also output a variable cut, transcribe this fusion, and control the output format as Sam or BAM, and selectively sort the output to BAM. The encode parameters are also provided in the comparison process.

STAR--runthreadn--readfilesin/home/fanyc/rna-seq/raw_data/srr993723.sra_1.fastq/home/fanyc/rna-seq/raw_data/ Srr993723.sra_2.fastq--quantmode Transcriptomesam--outsamtype BAM sortedbycoordinate--outfilenameprefix/home/ Fanyc/rna-seq/star/23--outfiltertype bysjout--outfiltermultimapnmax--alignsjoverhangmin 8-- Alignsjdboverhangmin 1--outfiltermismatchnmax 999--outfiltermismatchnoverlmax0.04--alignIntronMin 20-- Alignintronmax 1000000--alignmatesgapmax 1000000--chimsegmentmin 20

The above combines the parameters of the encode, plus the comparison output to the BAM format, and the BAM format is sorted. In addition, the output variable shear, as well as transcription of the fusion results.

Raw sequencing data for-readfilesin output

--outsamtype BAM sortedbycoordinate output format for BAM and sort

--chimsegmentmin20 Output Fusion Transcript, 20 represents the shortest base number of the pair

--outfilenameprefix prefix of output file

--quantmode Transcriptomesam Transcription-based quantification

6: Generated Files:

Chimeric.out.junction Fusion Transcription

Aligned.sortedByCoord.out.bam output

Aligned.toTranscriptome.out.bam transcription of the output

SJ.out.tab variable Shear result output

About Transcriptome compared to star software use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.