A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
A detailed introduction to Blastp/psi-blast/phi-blast
Blastp/psi-blast/phi-blast is the BLAST ratio between the protein sequence and the protein sequence. 1,BLASTP: A comparison between the standard protein sequence and the protein sequence
Standard protein BLAST are designed for protein searches.
BLASTP is used to determine the amino acid sequence of the query to find similar sequences in the protein database. As with other blast programs, the aim is to find similar areas. 2,psi-blast: A comparison between protein sequences with higher sensitivity and protein sequences
Psi-blast is designed for more sensitive protein-protein similarity searches.
Position-specific iterated (PSI)-blast is a highly sensitive BLASTP procedure that is very effective for discovering similar proteins in distant relatives or new members of a protein family. When you use the standard BLASTP to fail, or the result of a comparison is merely a pseudo-gene or a presumed genetic sequence ("hypothetical protein" or "similar to ..."), you can choose Psi-blast to try again. 3,phi-blast: Pattern Discovery Iteration BLAST
Phi-blast can do a restricted protein pattern search.
Phi-blast, pattern Discovery iterative BLAST, a program that uses protein queries to search for a protein database. Just find the alignment of the special patterns contained in those query sequences.
The syntax of Phi is described in more detail here: http://www.ncbi.nlm.nih.gov/blast/html/PHIsyntax.html (but not too clear)
Lc. Note: Translation is not good, please forgive me.
detailed usage of local blast
Blastall-p blastn-i myrna.fasta-d humanrna.fasta-o myresult.blastout-a 2-f f-t t-e 1e-10
The explanations are as follows:
Blastall: This is the name of the program when the localization/command line executes blast. (Tips:blastall direct return will give you all the parameters help, but in English)
-p:p is shorthand for program, which is the meaning of programs in the computer field. This parameter is specified to use what the seed program, the so-called subroutine, is aimed at different needs, such as nucleic acid sequence and nucleic acid sequence for comparison, protein sequence and protein sequence comparison, assuming that the translated nucleic acid sequence in the protein sequence is compared, select the appropriate subroutine: BLASTN is used for nucleic acid on nucleic acid BLASTP is a protein to a protein sequence, etc., altogether 5 self-procedures.
-I:I is the shorthand for input, which means that it is your own sequence file (Fasta format) to be compared.
-d:d is the shorthand for database, which means to compare the target databases, in the example is Humanrna.fasta (don't forget to FORMATDB)
-o:o is a shorthand for the output, meaning the result file name, the name according to your own habits, you can take the path, (the top two parameters-i-d can also take a path)
* Note that the above 4 parameters are necessary, integral, the following parameters are to get better results of their own adjustable parameters, if you do not add also does not matter, the Blastall program itself will give a default value.
-A: Refers to the calculation of the number of CPUs to use, my machine has two CPUs, so with-a 2, so you can parallelize to calculate, improve speed, of course, your computer on a CPU, can not use this parameter, the system default value is 1, is a CPU
-F: Is the filter shorthand, Blastall program has a simple repetition sequence and low complexity of some repeats filter, the default is T (note that there are several parameters on the two options, t/f T is ture, really, you can understand to open the function; F is false, false, understood to turn off the function)
-T: is the abbreviation of HTML, refers to whether the blast result file is in HTML format, the default is f! if you want to see with IE, I recommend t
-E: Is expectation value, expected value, default is 10, I use 10-10.
Blastall usage A. Formatting a sequence database
Format Sequence Database--formatdb
Formatdb Brief Introduction:
Formatdb deals with formats ASN.1 and FASTA, and whether it is a nucleotide sequence database or a protein sequence database, this step is essential whether you are using Blastall or Blastpgp,mega blast applications.
FORMATDB command-line arguments:
Formatdb-Get Formatdb all the parameters shown (see appendix II) and Introduction,
Description of the main parameter:
-I input needs to format the source database name Optional- p file type, is the nucleotide sequence database, or the protein sequence database T–protein f-nucleotide [t/f] Optional default = the format of the input database for T-A is ASN.1 (otherwise Fasta) t-true, F-false. [t/f] Optional default = F- o parsing option t-true: Parse sequence identity and establish directory F-false: Opposite [t/f] Optional default = F
Formatdb-i ecoli.nt-p F-o T
Running this command will generate 7 files for blast search in the current directory, and once the FORMATDB command is completed, it will no longer be necessary to ecoli.nt and can be removed. At this point, blastall can be used directly.
Brief analysis on common parameters of B.blastall
-P Program Name [String]
The program name [String] used by the user can select a program from BLASTN,BLASTP,BLASTX,TBLASTN,TBLASTX as needed.
-D Database [String] default = NR
The name of the sequence database used [String], the default is: NR
-I Query File [file in] default = stdin
The query sequence file used by [file in], the default is: stdin, this example is Test.txt
-e expectation value (e) [Real] default = 10.0
Expected value [Real] defaults to 10.0 describes the random number of matching sequences that occur when a particular database is searched.
-M alignment View options: The specific description of the comparison display option can be illustrated with the following comparison example
0 = pairwise, display specific match information (default)
1 = query-anchored showing identities, query-upper area, display consistency
2 = query-anchored No identities, query-upper area, do not show consistency
3 = Flat query-anchored, show identities, query-the screen text form of the upper area, display consistency
4 = Flat query-anchored, no identities, query-compared to the upper area of the screen text form, does not show consistency
5 = query-anchored No identities and blunt ends, query-upper area, no consistency, no abrupt end
6 = Flat query-anchored, no identities and blunt ends, query-the screen text form of the upper area does not show consistency
7 = output in XML Blast output,xml format
8 = output in tabular,tab format
9 =tabular with comment lines, tab-formatted output with comment lines
=ASN, text, version of the ASN format output
=ASN, binary [Integer] default = 0, binary-mode ASN format output
The use of-M 8 is illustrated below:
A_query b_sbjct 97.61 585 3 3 309 886 94498 95078 0.0 1017
A_query b_sbjct 100.00 303 0 0 913 1215 95092 95394 2e-172 601
A_query b_sbjct 100.00 209 0 0 1 209 94196 94404 3e-116 414
A_query B_SBJCT 100.00 123 0 0 1234 1356 95413 95535 6e-65 244
A_query b_sbjct 100.00 0 0 94096 94136 5e-16 81.8
A_query b_sbjct 100.00 0 0 251 285 94440 94474 2e-12 69.9
A_query b_sbjct 100.00 0 0 885 913 95747 95775 7e-09 58.0
A_query a_query 97.61 585 3 3 309 886 403 983 0.0 1017
A_query a_query 100.00 303 0 0 913 1215 997 1299 2e-172 601
A_query a_query 100.00 209 0 0 1 209 101 309 3e-116 414
A_query a_query 100.00 123 0 0 1234 1356 1318 1440 6e-65 244
A_query a_query 100.00 0 0 1 5e-16 81.8
A_query a_query 100.00 0 0 251 285 345 379 2e-12 69.9
A_query a_query 100.00 0 0 885 913 1652 1680 7e-09 58.0
Result 12 columns
Query id,subject id,% identity,alignment length,mismatches,gap openings,q. start,q. end,s. Start,s. end,e-value,bit Score
-o BLAST report output file [file out] Optional default = Stdout,blast reports the exported files [file out] defaults to: stdout
-F Filter Query sequence (DUST with Blastn, SEG with others) [String] default = T
Query sequence filtering, which filters out low-complexity areas that give influence over results. The sequence of queries with BLASTN is filtered by the dust program, and the others are filtered with the SEG. For details on the dust and SEG, users can check their own information.
-G cost to open a gap (zero invokes default behavior) [Integer] default = 0
Empty open penalty [Integer] (set to 0 to invoke default behavior) defaults to 0 points
-e cost to extend a gap (zero invokes default behavior) [Integer] default = 0
void extended penalty [Integer] (set to 0 to invoke default behavior) defaults to 0 points
-T produce HTML output [t/f] default = F
Print in web Form
-X dropoff value for gapped alignment (in bits) (zero invokes default behavior)
BLASTN, Megablast, Tblastx 0, all others [Integer],default = 0
-I Show GI ' s in deflines [t/f] default = F
The hint line shows GI number is not displayed by default
-Q penalty for a nucleotide mismatch (blastn only) [Integer] default = 3
Nucleic acid sequence base pair mismatch penalty score (BLASTN only) [Integer] default penalty 3 points
-R reward for a nucleotide match (blastn only) [Integer] default = 1
Nucleotide sequence base pairs match the added score (blastn only) [Integer] By default plus 1 points
-G perfom gapped alignment (not available with TBLASTX) [t/f] default = T
Whether to perform a available with a notch (not a with TBLASTX) is
-A number of processors to use [Integer] default = 1
Number of processors used [Integer] defaults to stand-alone
-B number of concatenated queries, for BLASTN and Tblastn [Integer] Optional default = 0
Number of sequences requiring a query for BLASTN and TBLASTN [Integer] default to single sequence
-M matrix [string],default = BLOSUM62 scoring matrix, default BLOSUM62
-W Word size, default if zero (BLASTN, Megablast, all others 3) [Integer] default = 0
The open window
-W Frame shift penalty (OOF algorithm for BLASTX) [Integer] default = 0
Window Penalty points
Start building with 50+ products and up to 12 months usage for Elastic Compute Service