Detailed description of parameters in blast +

Source: Internet
Author: User
[Switch] Detailed description of the parameters of the blast + blaln file reposted from lidaof final editor lidaof

Compared with the previous blast, the new blast + will separate the cooperation of the "blastn" and "blastx" from the "blastall" command, making it easier to customize the parameters of each command.

I summarized some commonly used parameters in the process of using STN, which are summarized as follows:

Blastn-DB database_name-query input_file-out output_file-evalue-max_target_seqs num_sequences-num_threads int_value-outfmt format format_string

Blastn-DB database_name-query input_file-out output_file-evalue-max_target_seqs num_sequences-num_threads int_value-outfmt format "7 qacc SACC evalue length pident"

For example:

Blastn-DB plant_rna-query test. Fa-out test. Out-evalue 0.00001-max_target_seqs 5-num_threads 4-outfmt format "7 qacc SACC evalue length pident"

Blastn: let alone this. Comparison of nucleic acid to nucleic acid

-DB: Specifies the database used for blast search. For details, see the previous article.

-Query: the input sequence used for query, In the FASTA format.

-Out: output result File

-Evalue: Set the e value cutoff.

-Max_target_seqs: set the maximum number of target sequence matches (I used to use-B 5-V 5 in the past. If you do not understand it, please advise)

-Num_threads: specifies the number of CPU running tasks (dependent on your system, equivalent to the previous-a parameter)

-Outfmt format "7 qacc SACC evalue length pident": This is the most popular feature in new blast +. It directly controls the output format and does not need to use parser, 7 indicates the output of the tab format with comment lines. You can customize the content to be output, use spaces to split the content behind 7, and enclose all output control in double quotation marks, the acc of the qacc query sequence, SACC indicates the ACC of the target sequence, evalue indicates the e value, Length indicates the matching length, pident indicates the same percentage of the sequence, and other available features (red font) as follows:

* ** Formatting options
-Outfmt <string>
Alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = xml blast output,
6 = tabular,
7 = tabular with comment lines,
8 = text ASN.1,
9 = binary ASN.1
10 = comma-separated values

Options 6, 7, and 10 can be additionally configured to produce
A custom format specified by Space delimited format specifiers.
The supported format specifiers are:
When not provided, the default value is:
'Qseqid sseqid pident length mismatch gapopen qstart qend sstart send
Evalue bitscore ', which is equivalent to the keyword 'std'
Default = '0'

The following detailed help information can be printed by calling the "blastn cooperation plus-help" parameter.

Blastn-help

Blastn [-H] [-help] [-import_search_strategy filename]
[-Export_search_strategy filename] [-task task_name] [-DB database_name]
[-Dbsize num_letters] [-gilist filename] [-negative_gilist filename]
[-Entrez_query] [-db_soft_mask filtering_algorithm]
[-Subject subject_input_file] [-subject_loc range] [-query input_file]
[-Out output_file] [-evalue] [-word_size int_value]
[-Gapopen open_penalty] [-gapextend extend_penalty]
[-Perc_identity float_value] [-xdrop_ungap float_value]
[-Xdrop_gap float_value] [-xdrop_gap_final float_value]
[-Searchsp int_value] [-penalty] [-reward] [-no_greedy]
[-Min_raw_gapped_score int_value] [-template_type]
[-Template_length int_value] [-dust dust_options]
[-Filtering_db filtering_database]
[-Window_masker_taxid]
[-Window_masker_db] [-soft_masking]
[-Ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
[-Best_hit_score_edge float_value] [-window_size int_value]
[-Off_diagonal_range int_value] [-use_index Boolean] [-index_name string]
[-Lcase_masking] [-query_loc range] [-STRAND] [-parse_deflines]
[-Outfmt format] [-show_gis] [-num_descriptions int_value]
[-Num_alignments int_value] [-HTML] [-max_target_seqs num_sequences]
[-Num_threads int_value] [-remote] [-version]

Description
Nucleus otide-nucleus otide blast 2.2.23 +

Optional arguments
-H
Print usage and description; ignore other arguments
-Help
Print usage, description and arguments description; ignore other arguments
-Version
Print version number; ignore other arguments

* ** Input query options
-Query <file_in>
Input File Name
Default = '-'
-Query_loc <string>
Location on the query sequence (Format: start-stop)
-Strand <string, 'both ', 'minus', 'gal'>
Query strand (s) to search against database/Subject
Default = 'both'

* ** General search options
-Task <string, permissible values: 'blay' 'blayn-short ''DC-megabla'
'Megabla' 'vecscreen'>
Task to execute
Default = 'megabla'
-DB <string>
BLAST database name
* Incompatible with: subject, subject_loc
-Out <file_out>
Output file name
Default = '-'
-Evalue <real>
Expectation value (e) threshold for saving hits
Default = '10'
-Word_size <integer, >=4>
Word Size for wordfinder algorithm (length of best perfect match)
-Gapopen <integer>
Cost to open a gap
-Gapextend <integer>
Cost to extend a gap
-Penalty <integer, <= 0>
Penalty for a nucleus otide Mismatch
-Reward <integer,> = 0>
Reward for a nucleus otide match
-Use_index <Boolean>
Use megablast database index
-Index_name <string>
Megablast database index name

* ** Blast-2-sequences options
-Subject <file_in>
Subject sequence (s) to search
* Incompatible with: DB, gilist, negative_gilist, db_soft_mask
-Subject_loc <string>
Location on the subject sequence (Format: start-stop)
* Incompatible with: DB, gilist, negative_gilist, db_soft_mask, remote

* ** Formatting options
-Outfmt <string>
Alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = xml blast output,
6 = tabular,
7 = tabular with comment lines,
8 = text ASN.1,
9 = binary ASN.1
10 = comma-separated values

Options 6, 7, and 10 can be additionally configured to produce
A custom format specified by Space delimited format specifiers.
The supported format specifiers are:
Qseqid means query seq-ID
Qgi means query Gi
Qacc means query accesion
Sseqid means subject seq-ID
Sallseqid means all subject seq-ID (s), separated by ';'
SGI means subject Gi
Sallgi means all subject GIS
SACC means subject accession
Sallacc means all subject accessions
Qstart means start of alignment in Query
Qend means end of alignment in Query
Sstart means start of alignment in subject
Send means end of alignment in subject
Qseq means aligned part of query Sequence
Sseq means aligned part of subject Sequence
Evalue means exact CT value
Bitscore means bit score
Score means raw score
Length means alignment length
Pident Means percentage of identical matches
Nident means number of identical matches
Mismatch means number of mismatches
Positive means number of positive-scoring matches
Gapopen means number of gap openings
Gaps means total number of gaps
PPOs Means percentage of positive-scoring matches
Frames means query and subject frames separated by '/'
Qframe means query Frame
Sframe means subject Frame
When not provided, the default value is:
'Qseqid sseqid pident length mismatch gapopen qstart qend sstart send
Evalue bitscore ', which is equivalent to the keyword 'std'
Default = '0'
-Show_gis
Show clr gis in deflines?
-Num_descriptions <integer,> = 0>
Number of database sequences to show one-line descriptions
Default = '20140901'
-Num_alignments <integer,> = 0>
Number of database sequences to show alignments
Default = '20140901'
-Html
Produce HTML output?

* ** Query filtering options
-Dust <string>
Filter query sequence with dust (Format: 'yes', 'level window linker ', or
'No' to disable)
Default = '20 64 1'
-Filtering_db <string>
BLAST database containing filtering elements (I. e.: repeats)
-Window_masker_taxid <integer>
Enable windowmasker filtering using a taxonomic ID
-Window_masker_db <string>
Enable windowmasker filtering using this repeats database.
-Soft_masking <Boolean>
Apply filtering locations as soft masks
Default = 'true'
-Lcase_masking
Use lower case filtering in query and subject sequence (s )?

* ** Restrict search or results
-Gilist <string>
Restrict search of database to list of GI's
* Incompatible with: negative_gilist, remote, subject, subject_loc
-Negative_gilist <string>
Restrict search of database to everything doesn't the listed GIS
* Incompatible with: gilist, remote, subject, subject_loc
-Entrez_query <string>
Restrict search with the given Entrez Query
* Requires: Remote
-Db_soft_mask <integer>
Filtering Algorithm ID to apply to the blast database as soft masking
* Incompatible with: subject, subject_loc
-Perc_identity <real, 0 .. 100>
Percent identity
-Culling_limit <integer,> = 0>
If the query range of a hit is enveloped by that of at least this week
Higher-scoring hits, delete the hit
* Incompatible with: best_hit_overhang, best_hit_score_edge
-Best_hit_overhang <real, (> = 0 and = <0.5)>
Best hit algorithm overhang value (Recommended Value: 0.1)
* Incompatible with: culling_limit
-Best_hit_score_edge <real, (> = 0 and = <0.5)>
Best hit algorithm score edge value (Recommended Value: 0.1)
* Incompatible with: culling_limit
-Max_target_seqs <integer, >=1>
Maximum number of aligned sequences to keep

* ** Discontiguous megablast options
-Template_type <string, 'coding', 'coding _ and_optimal', 'optimal'>
Discontiguous megablast template type
* Requires: template_length
-Template_length <integer, permissible values: '16'' '18 ''21 '>
Discontiguous megablast template Length
* Requires: template_type

* ** Statistical options
-Dbsize <int8>
Valid length of the database
-Searchsp <int8,> = 0>
Valid length of the search space

* ** Search strategy options
-Import_search_strategy <file_in>
Search Strategy to use
* Incompatible with: export_search_strategy
-Export_search_strategy <file_out>
File name to record the search strategy used
* Incompatible with: import_search_strategy

* ** Extension options
-Xdrop_ungap <real>
X-dropoff value (in bits) for ungapped extensions
-Xdrop_gap <real>
X-dropoff value (in bits) for preliminary gapped extensions
-Xdrop_gap_final <real>
X-dropoff value (in bits) for final gapped alignment
-No_greedy
Use non-Greedy Dynamic Programming Extension
-Min_raw_gapped_score <integer>
Minimum raw gapped score to keep an alignment in the preliminary gapped and
Traceback stages
-Ungapped
Perform ungapped alignment only?
-Window_size <integer,> = 0>
Multiple hits window size, use 0 to specify 1-hit Algorithm
-Off_diagonal_range <integer,> = 0>
Number of off-diagonals to search for the 2nd hit, use 0 to turn off
Default = '0'

* ** Miscellaneous options
-Parse_deflines
Shocould the query and subject defline (s) be parsed?
-Num_threads <integer, >=1>
Number of threads to use in the blast search
Default = '1'
* Incompatible with: Remote
-Remote
Execute search remotely?
* Incompatible with: gilist, negative_gilist, subject_loc, num_threads

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.