Fastx_toolkit Software usage Instructions

Source: Internet
Author: User

High-throughput sequencing data after the machine's original FASTQ file, containing 4 lines, one of the behavior quality value, the other line is the corresponding sequence, we all understand high-throughput data processing first to carry out quality control, these processes include the connector, filter low-quality reads, remove low-quality 3 ' and 5 ' end, Remove n more reads and so on, and for high-throughput sequencing data quality control software is also a lot, here to introduce a "old brand" quality control tool Fastx_toolkit, it is a package, contains a number of quality control commands, the following we will explain the parameters and use:

1. Fastq_quality_converter [-h] [-a] [-n] [-z] [-i INFILE] [-F OUTFILE] visual observation of mass values
[-h] = Print help
[-a] = output ASCII quality score (default).
[-N] = output quality value data.
[-Z] = gzip compressed output.
[-I INFILE] = Enter files in FASTA/FASTQ format.
[-o OUTFILE] = output FASTA/FASTQ file.

2. Fastq_masker [-h] [-v] [-Q N] [-R C] [-z] [-i INFILE] [-O OUTFILE] shielding low-quality base
[-Q N] = quality threshold, the mass value below this threshold value will be mask off, the default value is 10
[-r C] = Replace low-quality base with C, default N to replace
[-Z] = output is compressed with gzip.
[-I INFILE] = input Fasta file
[-o OUTFILE] = output file
[-v] = verbose-report sequence number, if you use-o then the report is directly in stdout, if not, enter to stderr

3. Fastq_quality_filter [-h] [-v] [-Q n] [-P n] [-z] [-i INFILE] [-O OUTFILE] filter low mass sequence
[-Q N] = minimum quality value to be left
[-P N] = The minimum number of bases per reads is required to have a quality value of-Q
[-Z] = compressed output
[-v] = verbose-report sequence number, if you use-o then the report is directly in stdout, if not, enter to stderr

4. Fastq_quality_trimmer [-h] [-v] [-t n] [-l n] [-z] [-i INFILE] [-o OUTFILE] trim reads end
[-t N] = starting from the 5 ' end, the base of low and N masses will be trimmed off
[-l N] = minimum allowable length of the reads after construction
[-Z] = compressed output
[-v] = verbose-report sequence number, if you use-o then the report is directly in stdout, if not, enter to stderr

5. Fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o outfile]fastq Convert to Fasta [-r] = sequence with serial number rename
[-n] = sequence with n reserved, not reserved by default
[-Z] = compressed output

6. Fastx_trimmer [-h] [-f N] [-l n] [-t n] [-M minlen] [-z] [-v] [-I INFILE] [-O OUTFILE] from 3 ' start to 5 ' which parts remain

[-f N] = Starting from the base of the first, the default
[-l N] = back from the base of the first to retain, the default all the base is retained.
[-t n] = the tail of the sequence is trimmed off N bases.
[-m minlen] = trim off a sequence that is less than minlen in length.

7. Fastx_quality_stats [-h] [-n] [-I INFILE] [-o outfile]fastq file quality values are counted
[-I INFILE] = input FASTQ file
[-o OUTFILE] = output text file name
[-N] = using the new output format, using the old format by default
Old format output file: The following line represents a column of the output file
Column=1 to 36
Count = How many bases are in this column
min = base mass minimum for this column
max = base mass maximum for this column
sum = the sum of the base mass of this column
mean = base mass average of this column
Q1 = 1/4 Base Mass value
Med = median number of base mass values
Q3 = 3/4 base mass value.
IQR = q3-q1
LW = ' Left-whisker ' value (for boxplotting).
RW = ' Right-whisker ' value (for boxplotting).
A_count = number of this column A
C_count = number of this column C.
G_count = number of this column G.
T_count = number of this column T.
N_count = The number of this column n.
Max-count = maximum number of bases
New output format:
Number of Cycles
Maximum number
For each cycle of the base (all/a/c/g/t/n):
Count = number of base bases in this column
min = minimum value of the base mass of this column
Max = maximum value of the base mass of this column.
sum = synthesis of the base mass of this column.
mean = average of the base mass of this column
Q1 = 1/4 Base Mass value
Med = median number of base mass values
Q3 = 3/4 Base Mass value
IQR = q3-q1
LW = ' Left-whisker ' value (for boxplotting).
RW = ' Right-whisker ' value (for boxplotting).

8. fastq_quality_boxplot_graph.sh [-I. INPUT. TXT] [-t TITLE] [-p] [-o OUTPUT] plot base mass distribution box diagram
[-P] = generated. PS file, which produces PNG images by default
[-I INPUT. txt]= input file as fastx_quality_stats output file
[-o Output] = name of the output file
[-T title] = The title of the output image

9. fastx_nucleotide_distribution_graph.sh [-I. INPUT. TXT] [-t TITLE] [-p] [-o OUTPUT] Map base distribution
[-P] = generated. PS file, which produces PNG images by default.
[-I INPUT. TXT] = output file with input file as Fastx_quality_stats
[-o Output] = The name of the output file.
[-T title] = The title of the output image

10. Fastx_clipper [-h] [-a ADAPTER] [-d] [-l n] [-n] [-D n] [-c] [-c] [-o] [-v] [-z] [-I I NFILE] [-o outfile]  Remove connector sequence
  [-a ADAPTER] = Connector sequence (default = Ccttaagg)
  [-L-N]       =& nbsp, ignoring those reads with a base number less than N, default to 5
  [-D-N]       =  retains the N base of the connector sequence after the default   -d 0
  [-c]         =  discard the sequences without connectors.
  [-c]         =  keep only the sequence without connectors.
  [-K]         =  reports only the sequence of connectors.
  [-n]         =  reserved N-Series, default not reserved
  [-v]         = verbose-report Sequence number
  [-Z]         = compressed output.
  [-d]       =  output debug results.
  [-M-N]   = requires a minimum match to the length of the connector n, if the length of the match with the connector is less than n not trimmed
  [-I INFILE]  =  input file
  [-O OUTFILE ] =  output File



Reprint this article please contact the original person to obtain the authorization, at the same time please indicate this article from Chengchao Science Net Blog.
Link Address:http://blog.sciencenet.cn/blog-1509670-848270.html

Fastx_toolkit Software usage Instructions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.