The process of a rna-seq--know live turn

Source: Internet
Author: User

Data analysis Process

from Meng Hao's "QuickStart Bioinformatics" Live, Amazing ~

The first is the Quality Control section, using FASTQC to analyze the results.

For the results of Illumia second-generation sequencing, the quality control consists of two aspects, which eliminate the sequence of bad sequencing, that is, quality Control, and the second is the need to remove the short connector attached to the glass, cut adaptor.

-T 8 means that 8 cores are called to operate.

After that, a zip and an HTML file are generated for each sequence file.

For example:

Then the 2500000 must be different genes , but the sequencing length of the machine is 150, so all the genes are the same length.

For the 2.5 million reads, the q of each bit is made a box-line chart, which requires the box-line value to be at the lowest point above 20%, otherwise it needs to be removed.

At about 145 of the sequence Q value is lower than the sequencing instability, so the sequence after 145 of the whole do not.

This is the GC content map, usually a and t are the same, C and G are the same, but the first 10bp is unstable and needs to be removed.

Indicates the content of the sequence of segments measured during sequencing, the horizontal axis is 1-150bp, and the longitudinal axes are percentages. For some reason, most of the measurements are adaptor.

This is mainly a measure of the level of the building, the library usually has 6-8 rounds of PCR, but sometimes there is a phenomenon of p, when the duplication is too high, the need to go to the DUP. But in Rna-seq, there is usually no DUP.

② Next, use Fastx_trimmer to head to the tail .

Zcat $FASTQ _1 | Fastx_trimmer-f 11-l 140-z-o $out _fastq_1 &

Zcat Extract, $fastq _1 is the first file of the input, this file is decompressed after the result to fastx_trimmer this command,

The parameter of this command,-F, is the first BP (which is cut off in the first 10bp) that is retained, and the last BP (reserved to 140bp),-Z is the compression command,-O is the output to this file.

Where $: In bash, this is the current user; a variable reference operator. a=10; The echo $a will output 10.

③ use Cutadaptor to remove adaptor at both ends.

After trimmer there is a process of going to adaptor, using Cutadaptor software,

Nohup cutadapt--times 1-e 0.1-0 3--quality-cutoff 6-m-a agatcggaagagc-a AGATCGGAAGAGC-o $out _fastq_1-p $out _fastq_2 $fastq _1 $fastq _2  >  $log _file 2>$1 & 

Where Nohup: Run the command without hanging off.

2>$1:$1 is the first parameter passed to the shell script; (From: https://www.cnblogs.com/kaituorensheng/p/4002697.html)

$# is the number of arguments passed to the script. $ is the name of the script itself. $ is the first parameter passed to the shell script, and the second argument that is passed to the shell script is [email protected] is a list of all the parameters passed to the script $*  is to display all parameters passed to the script in a single string, unlike positional variables, which can be more than 9 $$ is the current process ID number for the script to run? is to display the exit status of the last command, 0 means no error, others indicate an error   

Times 11 sequence only go once adaptor;-e 0.1 can have a 10% error rate when matching, and-o 3 adaptor sequence must and sequencing sequence has 3 bases above the overlap can be; commonly used 6;-m 50 if the process is less than 50, discard the sequence, Short sequence sequencing quality may not be very good;-A and-A are common primers for Illumina, the reason for entering two, is because I am a double-ended sequencing results, two file content needs to be removed separately,-a corresponds to reads1,-a corresponding reads2, $fasrq _1 and _ 2 is the output of the previous step;> finally writes the log file

The process of rna-seq one time-know live turn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.