Interproscan 5.14-53.0 Installation

Source: Internet
Author: User

Interproscan 5.14-53.0 installation, currently the latest version of the Interproscan

Quote from daily life Letter--interproscan installation and use (end version) former Civil service network: http://code.google.com/p/interproscan/wiki/Introduction

Configuration Requirements : At least 2 cores and 4 GB of RAM in order to analyze the 5-10 sequences at the same time.

Software Requirements:

Linux, a-bit or a-bit (recommended).

Perl (default on most Linux distributions)

Oracle ' s Java jdk/jre version 6u4 and higher (which also includes Java 7)

Environment variables Set

Java_home should point to the location of the JVM

$JAVA _home/bin should is added to the CLASSPATH

To view my configuration:

[Email protected] ~]$ uname-a

Linux localhost.localdomain 2.6.18-238.el5 #1 SMP Sun Dec 14:22:44 EST x86_64 x86_64 x86_64 gnu/linux

Where x86_64 represents the "Bit"

[Email protected] ~]$ java-version

Java Version "1.6.0_35"
OpenJDK Runtime Environment (IcedTea6 1.13.7) (rhel-1.13.7.1.el5_11-x86_64)
OpenJDK 64-bit Server VM (build 23.25-b01, Mixed mode)

Now only supports version1.6 or 1.7

If your system-style redhat, comes with Java version 1.4, need to upgrade Java, using Yum can be upgraded, yum installation can view my other blog

[[Email protected] ~] $perl-version

This was Perl, v5.8.8 built for X86_64-linux-thread-multi

Copyright 1987-2006, Larry Wall

Perl May is copied only under the terms of either the artistic License or the
GNU general public License, which is found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should is found on
This system using "Man Perl" or "Perldoc perl". If you had access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

installation Interproscan5

1 Get Interproscan Software (x64) mkdir interproscancd Interproscan

wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.14-53.0/interproscan-5.14-53.0-64-bit.tar.gz
wget FTP://FTP.EBI.AC.UK/PUB/SOFTWARE/UNIX/IPRSCAN/5/5.14-53.0/INTERPROSCAN-5.14-53.0-64-BIT.TAR.GZ.MD5

# Use MD5 to ensure the integrity of downloaded files, return OK to prove complete

Md5sum-c INTERPROSCAN-5.14-53.0-64-BIT.TAR.GZ.MD5

2 Decompression

Tar-pxvzf interproscan-5.14-53.0-64-bit.tar.gz

2. Install Panther Models

Download Panther models to the subdirectory of the newly unzipped file/data below

CD [InterProScan5 home]/data/

wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/data/panther-data-8.1.tar.gz

wget FTP://FTP.EBI.AC.UK/PUB/SOFTWARE/UNIX/IPRSCAN/5/DATA/PANTHER-DATA-8.1.TAR.GZ.MD5

The file size is about a few gigabytes, and you need to check the MD5 after the download is over.

Md5sum-c PANTHER-DATA-8.1.TAR.GZ.MD5

Show # This must return *panther-data-8.1.tar.gz:ok* proof download no problem, otherwise re-download

Unzip

Tar-pxvzf panther-data-8.1.tar.gz

If you want to put this in other pieces of text, you can modify the [InterProScan5 home]/interproscan.properties file, change the following things

Panther.models.dir.8.1=path_to/panther/8.1/model

3. Using the pre-calculated Match Lookup Web server

The pre-calculated match Lookup Web server is capable of providing a comparison of over 30 million protein sequences, including all uniprotkb protein sequences. Interproscan 5 uses this server to speed up the local server. This is the version of the special sprints to use this server, you need a computer to surf the Internet: http://www.ebi.ac.uk to using it.

If your computer firewall blocks access to this site, you can download the localized Interproscan 5 lookup service (https://code.google.com/p/interproscan/wiki/ Locallookupservice) or turn off this function, you can add-DP at the command line or modify the interproscan.properties before adding a # comment.

Precalculated.match.lookup.service.url=http://www.ebi.ac.uk/interpro/match-lookup

How to use Interproscan

./interproscan.sh-i/path/to/sequences.fasta–o/san/–goterms–iprlookup–pa-f XML

You can run the example it provides:./interproscan.sh-i test_proteins.fasta-f TSV

will get the result of the TSV format, which contains a lot of database Gene3d, Pirsf,prints,panther,superfamily,pfam,tigerfam and other comparison results.

If this is not working, please refer to the problem solution provided on this webpage:

Https://code.google.com/p/interproscan/wiki/FAQ#3.What_should_I_do_if_one_of_the_binaries_included_with_ Interproscan_5_does_not_work_on_my_system?

Cd/interproscan

You can enter the./interproscan.sh directly at the terminal and you will see usage information

-APPL alone analysis, without this, all the results will be presented

Specifies the database, which can be used without the database version

./INTERPROSCAN.SH-APPL Pfama-i/path/to/sequences.fasta

If you wish to specifically run, or more analyses you can include MULTIPLE-APPL arguments:

You can also specify multiple databases

./interproscan.sh-appl PFAMA-27.0-APPL Prints-42.0-i/path/to/sequences.fasta

Or you can use a single-appl option with a comma-separated list of analyses:

Or, you can put multiple data together.

./INTERPROSCAN.SH-APPL Pfama,prints-i/path/to/sequences.fasta

A List of all available analyses are in the section "Included analyses"

-B Base Output filename Specifies the makefile path, as with the-o effect, if not added, the default name and path. Automatically adds a file suffix to the generated file name.

-D output directory, with-B,-O Mutex,

-DP Close the precalculated match lookup service, the default is on. According to the MD5 value to quickly verify whether the uploaded data has been commented, if it is already commented on the results directly. Save time.

The format of the-F output file, supported in the format TSV, XML, GFF3, HTML and SVG. The default format for the protein is

TSV, XML and GFF3, the nucleic acid format before the GFF3 and XML, now all can be oh.

./interproscan.sh-f xml-f html-i/path/to/sequences.fasta-b/path/to/output_file

Or

./interproscan.sh-f XML, Html-i/path/to/sequences.fasta-b/path/to/output_file

The difference between the centralized output format: https://code.google.com/p/interproscan/wiki/OutputFormats

-I input is a fasta format file.

-goterms turn on the go comment, but add the-iprlookup parameter to the front

-iprlookup to open InterPro annotations

-ms the size of the smallest nucleic acid ORF, if set small, spend the time of the president.

-O with the previous-b.-d cannot appear at the same time, if you set this, you must set the-F

-pa turn on possible metabolic notes

-T default temp file in/tmp, this is the location where temporary files can be set

The type of the-t input sequence. The default is protein, which can be DNA or RNA.

The database involved:

can be used directly.

tigrfam-xx.x : tigrfams protein Family Library based on hidden Markov model

prodom-xxxx.x: Prodom is a family of protein domains that are automatically generated by the UniProt knowledge database.

panther-x.x: The PANTHER (Protein analysis THrough Evolutionary relationships) is a standalone platform based on functionality, using published experimental evidence and evolutionary relationships to predict the function of genes without direct experimental evidence,

smart-x.x : SMART can be used to identify and analyze domain architectures based on hidden Markov models prositeprofiles-xx.xx:PROSITE contains a portal file that describes the protein domain, family, function sites, and relationships, with To

Distinguish the number of these proteins.

prositepatterns-xx.x.xx: Ibid.

superfamily-x.xx:                 Superfamily can give nucleic acids and proteins a database of function and result annotations. prints-xx.                 X: A fingerprint is a conservative model used to describe the protein family. gene3d-x.x.x: structural analysis of whole genes and genomes by using the Cath domain structure database

Pirsf-x. XX: The PIRSF is used to but do a guide to the uniprotkb sequence without overlapping and depth classification to reflect their evolutionary relationship

pfama-xx.x: a large class of protein families, each representing the results of sequence alignment and Hidden Markov.

hamap-xxxxxx. XX: high-quality Automated and Manual Annotation of microbial proteomes high-quality automatic annotations and hand-annotated protein groups of microorganisms

coils-x.x : Prediction of the spiral region of the protein Group Curl

Invalid analysis:

Signalp-gram_negative-x. X: analysis Signalp-gram_negative-x. X is deactivated, because the following parameters was not set in the Interproscan.properties File:binary.signalp.x.x.pat H

Signalp-gram_positive-x. X: analysis Signalp-gram_positive-x. X is deactivated, because the following parameters was not set in the Interproscan.properties File:binary.signalp.x.x.pat H

signalp-euk-x.x : Analysis signalp-euk-x.x was deactivated, because the following parameters is no set in the Terproscan.properties File:binary.signalp.x.x.path

Phobius-x.xx:analysis phobius-x.xx is deactivated, because the following parameters was not set in the Interproscan.prop Erties file:binary.phobius.pl.path.x.xx

Tmhmm-x.xc:analysis TMHMM-X.XC is deactivated, because the following parameters was not set in the Interproscan.properti Es file:binary.tmhmm.path

Scanning of nucleic acid sequences

Emboss Getorf. is a software embedded in the Interproscan gene prediction, if you want to install the software locally, you must modify the interproscan.sh script

# Set Environment variables for Getorf

Export Emboss_acdroot=bin/nucleotide

Export Emboss_data=bin/nucleotide

If you enter a nucleic acid sequence, you need to add the-t parameter when you run the command

./interproscan.sh-t N-i/path/to/nucleic_acid_sequences.fasta

Format conversion

XML can be converted to other formats

./interproscan.sh-mode convert-f tsv,gff3,svg-i/path/to/impact.xml-o/path/to/output_file_basename

Interproscan 5.14-53.0 Installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.