Ncbi-blast Local Installation

Source: Internet
Author: User

See: http://blog.shenwei.me/local-blast-installation/

NCBI blast+ Localization tutorials in Linux systems

This article is intended for beginners (preferably the basic Linux use), the master can be directly ignored. does not introduce the installation method in Windows system, one is because of the same idea, second, because Linux blast efficiency is higher, the system is more stable, will not die. So, please use a Linux server, I think you can not bear to let your beloved laptop run dozens of hours of the program bar.

Please do not because of length, and feel very difficult, just for beginners to understand, narrative more detailed (Luo) fine (suo) only

——————-[Prepare for the mental preparation, begin] ——————-

1. Install the configuration blast+ program

Download the latest blast executable in ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/(do not download source code ', very slow code compilation), select the precompiled version, such as ncbi-blast-2.2.30+-x64-linux.tar.gz. If the server can be networked, you can download it directly with wget. Alternatively, it is downloaded and then transferred to the server using an SFTP client.

wget FTP://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/latest/ncbi-blast-2.2.30+- x64-linux.tar.gz Decompression:tar -zxvf ncbi-blast-2.2. +-x64-linux. tar. gz

(After all, it is possible to use the absolute path directly after the decompression, but for the convenience of the future, or continue to configure it) for convenience, move it to the directory where I installed the local program (refer to "[Linux well-set directory structure][1]"), and rename (fixed name, without version number, Avoid modification of configuration files due to upgrade), unified management.

MV ncbi-blast-2.2. + ~/local/app/~/local/app/                    # Enter the local program installation path MV ncbi-blast-2.2.  + Blast        # Modify directory Name

Now, the blast+ has been installed into the ~/local/app/blast (~ know, is the user's home directory, can be replaced by the environment variable $home).

pwd # View the absolute path of the current directory /db/home/shenwei/local/app/ls  # View the contents of the current directory bin  ChangeLog  Doc  LICENSE  ncbi_package_info  README

The absolute path of the directory where the blast+ executable is located (bin) is added to the environment variable $path, which is conveniently called directly from the program name. Edit the ~/.BASHRC file and add the following line at the end:

Export Path=/db/home/shenwei/local/app/blast/bin: $PATH

If you do not use an editor such as Vi/vim, you can directly run the following line of commands to add the above to the ~/.BASHRC file (see Clearly, unlike above: $ escaped):

Echo " Export path=/db/home/shenwei/local/app/blast/bin:\ $PATH " >> ~/.BASHRC

Let the configuration take effect:

SOURCE ~/.BASHRC

In this case, you can enter blast's subroutine directly, such as BLASTN. Try the input blast -version to see if it shows as follows:

[Email protected] blast]$ blastn-2.2. +2.2.  - Ten  - from:£ º

——————-[Take a break and encourage yourself] ——————-

2. Configure the local blast library

Localization of blast databases can greatly improve efficiency when large amounts of alignment are required.

The directory where I store the library files is ~/data/blast . Create and edit (copy to a local text file Ncbirc.txt, then to the server, then rename mv ncbirc.txt .ncbirc ) NCBI Blast Global profile (home directory), if it is not yet edited, as follows:

[Email protected] ~]$Cat. NCBIRC # This is the command to view the contents of the file, below is the content; Start the section forBLAST Configuration[blast]; Specifies the path where BLAST databases is Installedblastdb=/db/home/shenwei/data/Blast; Specifies the data sources to use forautomatic resolution; forsequence Identifiersdata_loaders=Blastdb; Specifies the BLAST database to use resolve protein Sequencesblastdb_prot_data_loader=/db/home/shenwei/data/blast/nr; Specifies the BLAST database to use resolve protein Sequencesblastdb_nucl_data_loader=/db/home/shenwei/data/blast/ntbatch_size=10G; Windowmasker Settings[window_masker]window_masker_path=/db/home/shenwei/data/blast/windowmasker; end offile

Once configured, you can enter only the name (such as NR) when selecting the library, instead of entering the absolute path.

3. Download the library file

Once configured, use the blast+ update_blastdb.pl script to download the NR and NT library files (it is not recommended to download the sequence file because the latter file is larger, and the second is because the sequence blastdbcmd -db nr -dbtype prot -entry all -outfmt "%f" -out nr.fa can be extracted from the library file. The main thing is that it takes a long time to build the library, and it can be downloaded automatically by running the following command directly.

update_blastdb.pl NT NR

reminder : The download file is large and takes a long time, it is best to turn the task into the background. See the section "Running a lot of background tasks with screen" in Shell Note. As a simple practice, you can also use the Nohup command (below Nohup, which uses the time command, to see the entire consumption):

 time update_blastdb.pl nt nr > Log &

Does the monitoring library file download complete, how to judge? 1. Check if the log file is prompted; 2. See if update_blastdb.pl is still running: Executes the ps -aef | grep update_blastdb.pl | grep -v update_blastdb.pl command, if there is no result, then the description is not running.

After the download is complete, unzip all the tar.gz files (with wildcards):

 Time tar -zxvf *. tar. gz > Log2 &

hint : In the future to update the library file, follow the above method to re-download the decompression.

Commonly used blast library files (such as the genome Reference sequence) can also be added, in the future when the call will not input the absolute path of the library file.

——————-[Praise yourself] ——————-

4. Basic usage

A direct Google Chinese tutorial is available. More authoritative, please refer to BLAST manual "BLAST Command line Applications User Manual" for more details. Specific parameter information can be directly input blastn-help ' lookup.

Tip :There are several blast output formats, with 11 containing the most information, and other formats available for Blast_formatter programs to be converted from 11 to other formats . So, compare the results using the 11 format.

1) If the NT library is localized, it is directly in the NT library, and the Chinese meaning of some parameters can be seen in the "blast+ use Method".

 One " [email protected] " 8

Where the output file name [email protected] is a personal habit, that is, "sequence file name. Blast subroutine name @ Library name. Result format", is this very intuitive?

Conversion format (such as custom table format):

" [email protected] " " 7 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids salltitles " " [email protected] "

Convert to default format:

" [email protected] " 0 " [email protected] "

2) If you do not have a localized NT library, you can add the-remote option (you cannot use the-num_threads parameter) to make an online comparison (of course, slower, for less data):

 One " [email protected] " -remote

3) build a library with your own sequence

Makeblastdb- in Db.fasta-dbtype nucl-parse_seqids-out dbname

If the library needs to be used frequently, the library files can be moved to the directory of the previously configured library files, in the future when running blast in other directories, you can directly enter the name of the library (do not enter an absolute path), directly use.

MV Dbname.* ~/data/blast

  

————-[Learn in practice, don't ask Google, read the manual] ————-

Ncbi-blast Local Installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.