Analysis on the scale-in; 1. Quality control experiment design; dual-End sequence merging; quality control experiment design

Source: Internet
Author: User

Analysis on the scale-in; 1. Quality control experiment design; dual-End sequence merging; quality control experiment design
This article uses HiSeq2500 PE250, the most popular type of sequencing data, as an example, combined with the current mainstream method QIIME + USearch custom analysis process. The sequencing data, experimental design, and intermediate files generated by course analysis required in this course can be downloaded directly from Baidu cloud. Link: http://pan.baidu.com/s/1hs1PXcw password: y33d this course code run, at least need Linux Platform + install QIIME 1 analysis before preparation # create a working directory and enter,-p parameter is if the folder has no error

mkdir -p example_PE250cd example_PE250
# Creating temporary files and result subdirectories
mkdir -p temp result
1. The 16 S extension of the sequencing data file mainly comes from the 250 bp (PE250) data produced by HiSeq2500, because of its long reading length and low price (cost-effective ). HiSeqX PE150 and MiSeq PE300 are also common, but the PE150 has a low short resolution, while the PE300 is expensive and the end sequence quality is too low. In addition, 454 of the large amount of research in the past, but the equipment has been discontinued, PacBio long read length can be directly sequencing 16 s full length 1.5kb represents the future trend. Sequencing companies usually return raw data and clean data. raw data is the raw data obtained by sequencing, clean data is usually used for quality evaluation and subsequent analysis to remove the results with a high proportion of uncertain N of the joint sequence and sequencing. Fastqc is commonly used for quality evaluation. Generally, the sequencing result file will be accompanied by an evaluation report, and the quality will be re-tested if it is too bad. In this step, the non-user must upload two data files pe250_1.fq.gzand pe250_2.fq.gz to the working directory, includes 2,500,000 dual-end 250bp data in fastq format. (Tip: You can download it on Windows and upload it to the server using filezilla and other tools.) to install fastqc, skip this step, if fastqc has been installed in the system, you can directly run fastqc-t 2 * .fq.gz. -T indicates the number of threads. It is recommended that the number of threads be the same as the number of data files to improve the evaluation speed. * .fq.gz is the input file and multiple files can be specified with the * wildcard. Two files are generated for each data in the running result, as shown below:
Pe250_0000fastqc.html # webpage Evaluation Report pe250_0000fastqc.zip # webpage report related text and image compressed packages
The data quality is as follows: 1-quality for the left end; 1-quality distribution box plot for the right end

The quality of the Left end is relatively high (the green, yellow, and red areas in the figure respectively represent excellent quality, good, and poor). the right end of the series has a higher quality, and the box also enters the red difference area, however, the median red line is located in the Green High Quality area. This result is already in the middle and upper order. In PE250 sequencing, the tail quality of the right end is greatly reduced, but as long as the left end is good, the dual-End sequence merging can be corrected, generally, you can use it with confidence. 2. The experiment design file is named mappingfilein qiime, and you can download the mappingfile.txt file. Your experiment must follow the format of the example to simulate the file. If the file is incorrect, it cannot be run later. QIIME comes with a tool to check whether the file is correctly written.
# Activate the work environment source activate qiime1 # disable the Work Environment: Disable it when not in use, otherwise your other programs may encounter errors source deactivate # verify whether the experiment design is wrong validate_mapping_file.py-m mappingfile.txt
The running result outputs three files.
Mappingfile_corrected.txt # The Experiment Design for automatic correction. The error code is automatically modified, but the result must meet your requirements. You can download and view the webpage to highlight the error location mappingfile. log # Run the report.
"No errors or warnings were found in mapping file." is displayed in the running result without misunderstanding .". If an error occurs, you are advised to view the generated webpage report, highlight the error, modify it, and re-check it until it is correct. For more instructions, read help http://qiime.org/scripts/validate_mapping_file.html 3. the first task of merging dual-ended sequences is to merge the dual-ended sequences. Based on the complementary pairing of the two ends of the series, the dual-ended sequences can be merged into the sequences of our expanded regions, in addition, the mass of overlapping areas can be corrected to retain the base results with the highest sequencing quality. Use the join_javasred_ends.py script to merge two files into a single one. The f/r parameter is the input left and right sequence. The compression format is supported *. gz; m is the selection method. The default value is fastq-join. You can also select SeqPrep, which is better but slower. o is the output file directory. For more instructions, read help http://qiime.org/scripts/join_paired_ends.html
# Double-End sequence merging join_1_red_ends.py-f PE250_1.fq.gz-r PE250_2.fq.gz-m fastq-join-o temp/PE250_join
After merging the sequences, we will see three files in the set output directory temp/PE250_join, as shown below:
Fastqjoin. join. fastq # The merged sequence fastqjoin. un1.fastq # The sequence fastqjoin. un2.fastq # which is not merged on the left end
Our downstream analysis usually only operates on fastqjoin. join. fastq.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.