First, the installation of software
1. Software Download:
Curl Ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.zip-O (familiar with Curl download file method, see http://www.cnblogs.com/duhuo/p /5695256.html)
2. Unzip
Unzip Edirect.zip
3. Add Environment variables
Echo ' Export path=/home/lmt/desktop/edirect/: $PATH ' >> ~/.ZSHRC (depending on your profile selection, it may make ~/.BASHRC)
Ii.. Entrez Direct functions
1.esearch search based on the given indexed fields
2.efilter filter results found before the search
3.efetch Download the required data according to the specified format
。。。。。
Iii. Examples of usage
Download nucleic acid or protein sequence (fasta format)
Esearch-db nucleotide-query ' chn-js-2014 ' | Efetch-format fasta > 11.fasta #下载的为全基因组碱基序列
>kp757892. 1 Porcine DeltaCoronavirus isolate chn-js-completegenome Acatggggactaaagataaaaattatagcattagtctataattttatctccctagcttcgctagttctctaccgacaccaatccaggtgcgtctgccaccaagttggctaccctttctagg GGCGCTTTCGCGCTTGCTCACCATTAGATTACCTGGAAACCAGCCATTCAGGTTGGAGTTTCCCCAGGCTCTTTTGTGTGGGCATTAGC
Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format gene_fasta > 22.fasta #下载的为各个区段的基因的碱基序列, such as s/e/m, separate
>lcl| Kp757892.1_gene_3 [gene=e] [LOCUS_TAG=PDCOV-CHN-JS-2014_GP3] [location=22797..23048]atggtagtcgacgactgggccgttaccatccctggacaatatattattgctatactagttgtcatctgcattggtgtggcactactttttattaacacttgcttagcttgtgttaa Attattttacaagtgctacctaggggcagcataccttgttaggcctattatagtgtactactccaagccgaaccccgtacctgaggatgagtttgtaaaagtacaccaatttcctagaaac Actcactatgtctga>lcl| Kp757892.1_gene_4[Gene=m][LOCUS_TAG=PDCOV-CHN-JS-2014_GP4] [location=23041..23694]atgtctgacgcagaagagtggcaaattattgttttcattgcgatcatatgggcgcttggcgtcatcctccaaggaggctatgccacgcgtaatcgtgtgatctatgttattaaact Tattctgctttggctgctccaacccttcaccctagtggtgaccatttggaccgcagttgacagatcatctaagaaggacgcagttttcattgtgtccataatttttgccgtactgaccttc Atatcctgggccaagtactggtatgactcaattcgcttattaatgaaaaccagatctgcatgggcactctcacctgagagtagactccttgcagggattatggatccaatgggtacatgga GGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC
Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format fasta_cds_aa > 33.fasta #下载的为各个区段的基因的蛋白序列, Separate (search in the nucleic acid library, try to use the protein library, found an error)
>lcl| Kp757892.1_prot_akc54443.1_3 [Gene=e][LOCUS_TAG=PDCOV-CHN-JS-2014_GP3] [Protein=envelope protein] [protein_id=akc54443.1] [location=22797..23048] [gbkey=CDS] Mvvddwavtipgqyiiailvvicigvallfintclacvklfykcylgaaylvrpiivyyskpnpvpedefvkvhqfprnthyv>lcl| Kp757892.1_prot_akc54444.1_4[Gene=m][LOCUS_TAG=PDCOV-CHN-JS-2014_GP4] [Protein=membrane protein] [protein_id=akc54444.1] [location=23041..23694] [gbkey=CDS] Msdaeewqiivfiaiiwalgvilqggyatrnrviyviklillwllqpftlvvtiwtavdrsskkdavfivsiifavltfiswakywydsirllmktrsawalspesrllagimdpmgtwrc Ipidhmapiltpvvkhgklklhgqelangisvrnppqdmvivspsdtfhytfkkpvesnndpefavliyqgdrasnaglhtittskagdarlykym
Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format fasta_cds_na > 44.fasta #下载的为各个区段基因的碱基序列, such as s/e/m, separate, and 22.fasta results, just more information
Download sequence (non-fasta format)
>lcl| Kp757892.1_cds_akc54443.1_3[Gene=e][LOCUS_TAG=PDCOV-CHN-JS-2014_GP3] [Protein=envelope protein] [protein_id=akc54443.1] [location=22797..23048] [gbkey=CDS] Atggtagtcgacgactgggccgttaccatccctggacaatatattattgctatactagttgtcatctgcattggtgtggcactactttttattaacacttgcttagcttgtgttaaattat Tttacaagtgctacctaggggcagcataccttgttaggcctattatagtgtactactccaagccgaaccccgtacctgaggatgagtttgtaaaagtacaccaatttcctagaaacactca Ctatgtctga>lcl| Kp757892.1_cds_akc54444.1_4[Gene=m][LOCUS_TAG=PDCOV-CHN-JS-2014_GP4] [Protein=membrane protein] [protein_id=akc54444.1] [location=23041..23694] [gbkey=CDS] Atgtctgacgcagaagagtggcaaattattgttttcattgcgatcatatgggcgcttggcgtcatcctccaaggaggctatgccacgcgtaatcgtgtgatctatgttattaaacttattc Tgctttggctgctccaacccttcaccctagtggtgaccatttggaccgcagttgacagatcatctaagaaggacgcagttttcattgtgtccataatttttgccgtactgaccttcatatc Ctgggccaagtactggtatgactcaattcgcttattaatgaaaaccagatctgcatgggcactctcacctgagagtagactccttgcagggattatggatccaatgggtacatggaggtgc ATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC
Esearch-db necleotide-query ' chn-js-2014 ' | Efetch-format GB > 55.fasta #下载的格式和在NCBI里的界面结果显示一样.
LOCUS KP75789225420bp Ss-rna Linear VRL --dec- -DEFINITION porcine DeltaCoronavirus isolate CHN-js- the, complete genome. Accession kp757892version KP757892.1KEYWORDS. SOURCE Porcine DeltaCoronavirus organism porcine DeltaCoronavirus Viruses; ssRNA viruses; ssRNA Positive-strand viruses, no DNA stage; Nidovirales; Coronaviridae; Coronavirinae.reference1(Bases1To25420) AUTHORS dong,n., fang,l., Zeng,s., Sun,q., Chen,h. and Xiao,s. TITLE Porcine DeltaCoronavirusinchMainland China JOURNAL emerging infect. Dis. +( A),2254-2255( -) PUBMED26584185REFERENCE2(Bases1To25420) AUTHORS dong,n., fang,l., Zeng,s., Sun,q. and Xiao,s. TITLE Direct Submission JOURNAL submitted ( .-feb- -) State Key Laboratory of agricultural Microbiology, Huazhong agricultural University,1Shizishan Street, Wuhan, Hubei430070, Chinacomment # #Assembly-data-start## sequencing technology:: Sanger dideoxy sequencing # #Assembly-data-end# #FEATURES Location/Qualifiers .....
。。。。。
。。。。 Gene22797..23048/gene="E"/locus_tag="PDCOV-CHN-JS-2014_GP3"CDS22797..23048/gene="E"/locus_tag="PDCOV-CHN-JS-2014_GP3"/codon_start=1/product="Envelope protein"/protein_id="AKC54443.1"/translation="MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKCYlgaaylvrpiivyyskpnpvpedefvkvhqfprnthyv"Gene23041..23694/gene="M"
。。。。。。
。。。。。。。
Linux command line downloads NCBI data using Entrez Direct