Linux command line downloads NCBI data using Entrez Direct

Source: Internet
Author: User

First, the installation of software

1. Software Download:

Curl Ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.zip-O (familiar with Curl download file method, see http://www.cnblogs.com/duhuo/p /5695256.html)

2. Unzip

Unzip Edirect.zip

3. Add Environment variables

Echo ' Export path=/home/lmt/desktop/edirect/: $PATH ' >> ~/.ZSHRC (depending on your profile selection, it may make ~/.BASHRC)

Ii.. Entrez Direct functions

1.esearch search based on the given indexed fields

2.efilter filter results found before the search

3.efetch Download the required data according to the specified format

。。。。。

Iii. Examples of usage

Download nucleic acid or protein sequence (fasta format)

Esearch-db nucleotide-query ' chn-js-2014 ' | Efetch-format fasta > 11.fasta #下载的为全基因组碱基序列

>kp757892. 1 Porcine DeltaCoronavirus isolate chn-js-completegenome Acatggggactaaagataaaaattatagcattagtctataattttatctccctagcttcgctagttctctaccgacaccaatccaggtgcgtctgccaccaagttggctaccctttctagg GGCGCTTTCGCGCTTGCTCACCATTAGATTACCTGGAAACCAGCCATTCAGGTTGGAGTTTCCCCAGGCTCTTTTGTGTGGGCATTAGC

Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format gene_fasta > 22.fasta #下载的为各个区段的基因的碱基序列, such as s/e/m, separate

>lcl| Kp757892.1_gene_3 [gene=e] [LOCUS_TAG=PDCOV-CHN-JS-2014_GP3] [location=22797..23048]atggtagtcgacgactgggccgttaccatccctggacaatatattattgctatactagttgtcatctgcattggtgtggcactactttttattaacacttgcttagcttgtgttaa Attattttacaagtgctacctaggggcagcataccttgttaggcctattatagtgtactactccaagccgaaccccgtacctgaggatgagtttgtaaaagtacaccaatttcctagaaac Actcactatgtctga>lcl| Kp757892.1_gene_4[Gene=m][LOCUS_TAG=PDCOV-CHN-JS-2014_GP4] [location=23041..23694]atgtctgacgcagaagagtggcaaattattgttttcattgcgatcatatgggcgcttggcgtcatcctccaaggaggctatgccacgcgtaatcgtgtgatctatgttattaaact Tattctgctttggctgctccaacccttcaccctagtggtgaccatttggaccgcagttgacagatcatctaagaaggacgcagttttcattgtgtccataatttttgccgtactgaccttc Atatcctgggccaagtactggtatgactcaattcgcttattaatgaaaaccagatctgcatgggcactctcacctgagagtagactccttgcagggattatggatccaatgggtacatgga GGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC

Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format fasta_cds_aa > 33.fasta #下载的为各个区段的基因的蛋白序列, Separate (search in the nucleic acid library, try to use the protein library, found an error)

>lcl| Kp757892.1_prot_akc54443.1_3 [Gene=e][LOCUS_TAG=PDCOV-CHN-JS-2014_GP3] [Protein=envelope protein] [protein_id=akc54443.1] [location=22797..23048] [gbkey=CDS] Mvvddwavtipgqyiiailvvicigvallfintclacvklfykcylgaaylvrpiivyyskpnpvpedefvkvhqfprnthyv>lcl| Kp757892.1_prot_akc54444.1_4[Gene=m][LOCUS_TAG=PDCOV-CHN-JS-2014_GP4] [Protein=membrane protein] [protein_id=akc54444.1] [location=23041..23694] [gbkey=CDS] Msdaeewqiivfiaiiwalgvilqggyatrnrviyviklillwllqpftlvvtiwtavdrsskkdavfivsiifavltfiswakywydsirllmktrsawalspesrllagimdpmgtwrc Ipidhmapiltpvvkhgklklhgqelangisvrnppqdmvivspsdtfhytfkkpvesnndpefavliyqgdrasnaglhtittskagdarlykym

Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format fasta_cds_na > 44.fasta #下载的为各个区段基因的碱基序列, such as s/e/m, separate, and 22.fasta results, just more information

Download sequence (non-fasta format)

>lcl| Kp757892.1_cds_akc54443.1_3[Gene=e][LOCUS_TAG=PDCOV-CHN-JS-2014_GP3] [Protein=envelope protein] [protein_id=akc54443.1] [location=22797..23048] [gbkey=CDS] Atggtagtcgacgactgggccgttaccatccctggacaatatattattgctatactagttgtcatctgcattggtgtggcactactttttattaacacttgcttagcttgtgttaaattat Tttacaagtgctacctaggggcagcataccttgttaggcctattatagtgtactactccaagccgaaccccgtacctgaggatgagtttgtaaaagtacaccaatttcctagaaacactca Ctatgtctga>lcl| Kp757892.1_cds_akc54444.1_4[Gene=m][LOCUS_TAG=PDCOV-CHN-JS-2014_GP4] [Protein=membrane protein] [protein_id=akc54444.1] [location=23041..23694] [gbkey=CDS] Atgtctgacgcagaagagtggcaaattattgttttcattgcgatcatatgggcgcttggcgtcatcctccaaggaggctatgccacgcgtaatcgtgtgatctatgttattaaacttattc Tgctttggctgctccaacccttcaccctagtggtgaccatttggaccgcagttgacagatcatctaagaaggacgcagttttcattgtgtccataatttttgccgtactgaccttcatatc Ctgggccaagtactggtatgactcaattcgcttattaatgaaaaccagatctgcatgggcactctcacctgagagtagactccttgcagggattatggatccaatgggtacatggaggtgc ATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC

Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format GB > 55.fasta #下载的格式和在NCBI里的界面结果显示一样.

LOCUS KP75789225420bp Ss-rna Linear VRL --dec- -DEFINITION porcine DeltaCoronavirus isolate CHN-js- the, complete genome. Accession kp757892version KP757892.1KEYWORDS. SOURCE Porcine DeltaCoronavirus organism porcine DeltaCoronavirus Viruses; ssRNA viruses; ssRNA Positive-strand viruses, no DNA stage; Nidovirales; Coronaviridae; Coronavirinae.reference1(Bases1To25420) AUTHORS dong,n., fang,l., Zeng,s., Sun,q., Chen,h. and Xiao,s. TITLE Porcine DeltaCoronavirusinchMainland China JOURNAL emerging infect. Dis. +( A),2254-2255( -) PUBMED26584185REFERENCE2(Bases1To25420) AUTHORS dong,n., fang,l., Zeng,s., Sun,q. and Xiao,s. TITLE Direct Submission JOURNAL submitted ( .-feb- -) State Key Laboratory of agricultural Microbiology, Huazhong agricultural University,1Shizishan Street, Wuhan, Hubei430070, Chinacomment # #Assembly-data-start## sequencing technology:: Sanger dideoxy sequencing # #Assembly-data-end# #FEATURES Location/Qualifiers ..... 
。。。。。
。。。。 Gene22797..23048/gene="E"/locus_tag="PDCOV-CHN-JS-2014_GP3"CDS22797..23048/gene="E"/locus_tag="PDCOV-CHN-JS-2014_GP3"/codon_start=1/product="Envelope protein"/protein_id="AKC54443.1"/translation="MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKCYlgaaylvrpiivyyskpnpvpedefvkvhqfprnthyv"Gene23041..23694/gene="M"
。。。。。。
。。。。。。。

Linux command line downloads NCBI data using Entrez Direct

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.