Interproscan 5.14-53.0 installation, currently the latest version of the Interproscan
Quote from daily life Letter--interproscan installation and use (end version) former Civil service network: http://code.google.com/p/interproscan/wiki/Introduction
Configuration Requirements : At least 2 cores and 4 GB of RAM in order to analyze the 5-10 sequences at the same time.
Software Requirements:
Linux, a-bit or a-bit (recommended).
Perl (default on most Linux distributions)
Oracle ' s Java jdk/jre version 6u4 and higher (which also includes Java 7)
Environment variables Set
Java_home should point to the location of the JVM
$JAVA _home/bin should is added to the CLASSPATH
To view my configuration:
[Email protected] ~]$ uname-a
Linux localhost.localdomain 2.6.18-238.el5 #1 SMP Sun Dec 14:22:44 EST x86_64 x86_64 x86_64 gnu/linux
Where x86_64 represents the "Bit"
[Email protected] ~]$ java-version
Java Version "1.6.0_35"
OpenJDK Runtime Environment (IcedTea6 1.13.7) (rhel-1.13.7.1.el5_11-x86_64)
OpenJDK 64-bit Server VM (build 23.25-b01, Mixed mode)
Now only supports version1.6 or 1.7
If your system-style redhat, comes with Java version 1.4, need to upgrade Java, using Yum can be upgraded, yum installation can view my other blog
[[Email protected] ~] $perl-version
This was Perl, v5.8.8 built for X86_64-linux-thread-multi
Copyright 1987-2006, Larry Wall
Perl May is copied only under the terms of either the artistic License or the
GNU general public License, which is found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should is found on
This system using "Man Perl" or "Perldoc perl". If you had access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
installation Interproscan5
1 Get Interproscan Software (x64) mkdir interproscancd Interproscan
wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.14-53.0/interproscan-5.14-53.0-64-bit.tar.gz
wget FTP://FTP.EBI.AC.UK/PUB/SOFTWARE/UNIX/IPRSCAN/5/5.14-53.0/INTERPROSCAN-5.14-53.0-64-BIT.TAR.GZ.MD5
# Use MD5 to ensure the integrity of downloaded files, return OK to prove complete
Md5sum-c INTERPROSCAN-5.14-53.0-64-BIT.TAR.GZ.MD5
2 Decompression
Tar-pxvzf interproscan-5.14-53.0-64-bit.tar.gz
2. Install Panther Models
Download Panther models to the subdirectory of the newly unzipped file/data below
CD [InterProScan5 home]/data/
wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/data/panther-data-8.1.tar.gz
wget FTP://FTP.EBI.AC.UK/PUB/SOFTWARE/UNIX/IPRSCAN/5/DATA/PANTHER-DATA-8.1.TAR.GZ.MD5
The file size is about a few gigabytes, and you need to check the MD5 after the download is over.
Md5sum-c PANTHER-DATA-8.1.TAR.GZ.MD5
Show # This must return *panther-data-8.1.tar.gz:ok* proof download no problem, otherwise re-download
Unzip
Tar-pxvzf panther-data-8.1.tar.gz
If you want to put this in other pieces of text, you can modify the [InterProScan5 home]/interproscan.properties file, change the following things
Panther.models.dir.8.1=path_to/panther/8.1/model
3. Using the pre-calculated Match Lookup Web server
The pre-calculated match Lookup Web server is capable of providing a comparison of over 30 million protein sequences, including all uniprotkb protein sequences. Interproscan 5 uses this server to speed up the local server. This is the version of the special sprints to use this server, you need a computer to surf the Internet: http://www.ebi.ac.uk to using it.
If your computer firewall blocks access to this site, you can download the localized Interproscan 5 lookup service (https://code.google.com/p/interproscan/wiki/ Locallookupservice) or turn off this function, you can add-DP at the command line or modify the interproscan.properties before adding a # comment.
Precalculated.match.lookup.service.url=http://www.ebi.ac.uk/interpro/match-lookup
How to use Interproscan
./interproscan.sh-i/path/to/sequences.fasta–o/san/–goterms–iprlookup–pa-f XML
You can run the example it provides:./interproscan.sh-i test_proteins.fasta-f TSV
will get the result of the TSV format, which contains a lot of database Gene3d, Pirsf,prints,panther,superfamily,pfam,tigerfam and other comparison results.
If this is not working, please refer to the problem solution provided on this webpage:
Https://code.google.com/p/interproscan/wiki/FAQ#3.What_should_I_do_if_one_of_the_binaries_included_with_ Interproscan_5_does_not_work_on_my_system?
Cd/interproscan
You can enter the./interproscan.sh directly at the terminal and you will see usage information
-APPL alone analysis, without this, all the results will be presented
Specifies the database, which can be used without the database version
./INTERPROSCAN.SH-APPL Pfama-i/path/to/sequences.fasta
If you wish to specifically run, or more analyses you can include MULTIPLE-APPL arguments:
You can also specify multiple databases
./interproscan.sh-appl PFAMA-27.0-APPL Prints-42.0-i/path/to/sequences.fasta
Or you can use a single-appl option with a comma-separated list of analyses:
Or, you can put multiple data together.
./INTERPROSCAN.SH-APPL Pfama,prints-i/path/to/sequences.fasta
A List of all available analyses are in the section "Included analyses"
-B Base Output filename Specifies the makefile path, as with the-o effect, if not added, the default name and path. Automatically adds a file suffix to the generated file name.
-D output directory, with-B,-O Mutex,
-DP Close the precalculated match lookup service, the default is on. According to the MD5 value to quickly verify whether the uploaded data has been commented, if it is already commented on the results directly. Save time.
The format of the-F output file, supported in the format TSV, XML, GFF3, HTML and SVG. The default format for the protein is
TSV, XML and GFF3, the nucleic acid format before the GFF3 and XML, now all can be oh.
./interproscan.sh-f xml-f html-i/path/to/sequences.fasta-b/path/to/output_file
Or
./interproscan.sh-f XML, Html-i/path/to/sequences.fasta-b/path/to/output_file
The difference between the centralized output format: https://code.google.com/p/interproscan/wiki/OutputFormats
-I input is a fasta format file.
-goterms turn on the go comment, but add the-iprlookup parameter to the front
-iprlookup to open InterPro annotations
-ms the size of the smallest nucleic acid ORF, if set small, spend the time of the president.
-O with the previous-b.-d cannot appear at the same time, if you set this, you must set the-F
-pa turn on possible metabolic notes
-T default temp file in/tmp, this is the location where temporary files can be set
The type of the-t input sequence. The default is protein, which can be DNA or RNA.
The database involved:
can be used directly.
tigrfam-xx.x : tigrfams protein Family Library based on hidden Markov model
prodom-xxxx.x: Prodom is a family of protein domains that are automatically generated by the UniProt knowledge database.
panther-x.x: The PANTHER (Protein analysis THrough Evolutionary relationships) is a standalone platform based on functionality, using published experimental evidence and evolutionary relationships to predict the function of genes without direct experimental evidence,
smart-x.x : SMART can be used to identify and analyze domain architectures based on hidden Markov models prositeprofiles-xx.xx:PROSITE contains a portal file that describes the protein domain, family, function sites, and relationships, with To
Distinguish the number of these proteins.
prositepatterns-xx.x.xx: Ibid.
superfamily-x.xx: Superfamily can give nucleic acids and proteins a database of function and result annotations. prints-xx. X: A fingerprint is a conservative model used to describe the protein family. gene3d-x.x.x: structural analysis of whole genes and genomes by using the Cath domain structure database
Pirsf-x. XX: The PIRSF is used to but do a guide to the uniprotkb sequence without overlapping and depth classification to reflect their evolutionary relationship
pfama-xx.x: a large class of protein families, each representing the results of sequence alignment and Hidden Markov.
hamap-xxxxxx. XX: high-quality Automated and Manual Annotation of microbial proteomes high-quality automatic annotations and hand-annotated protein groups of microorganisms
coils-x.x : Prediction of the spiral region of the protein Group Curl
Invalid analysis:
Signalp-gram_negative-x. X: analysis Signalp-gram_negative-x. X is deactivated, because the following parameters was not set in the Interproscan.properties File:binary.signalp.x.x.pat H
Signalp-gram_positive-x. X: analysis Signalp-gram_positive-x. X is deactivated, because the following parameters was not set in the Interproscan.properties File:binary.signalp.x.x.pat H
signalp-euk-x.x : Analysis signalp-euk-x.x was deactivated, because the following parameters is no set in the Terproscan.properties File:binary.signalp.x.x.path
Phobius-x.xx:analysis phobius-x.xx is deactivated, because the following parameters was not set in the Interproscan.prop Erties file:binary.phobius.pl.path.x.xx
Tmhmm-x.xc:analysis TMHMM-X.XC is deactivated, because the following parameters was not set in the Interproscan.properti Es file:binary.tmhmm.path
Scanning of nucleic acid sequences
Emboss Getorf. is a software embedded in the Interproscan gene prediction, if you want to install the software locally, you must modify the interproscan.sh script
# Set Environment variables for Getorf
Export Emboss_acdroot=bin/nucleotide
Export Emboss_data=bin/nucleotide
If you enter a nucleic acid sequence, you need to add the-t parameter when you run the command
./interproscan.sh-t N-i/path/to/nucleic_acid_sequences.fasta
Format conversion
XML can be converted to other formats
./interproscan.sh-mode convert-f tsv,gff3,svg-i/path/to/impact.xml-o/path/to/output_file_basename
Interproscan 5.14-53.0 Installation