Ext.: https://shengxin.ren/article/16
Https://www.cnblogs.com/lmt921108/p/7442699.html
The SRA (Sequence readarchive) database is the raw data used to store the second generation of sequencing, including 454,illumina,solid,iontorrent,helicos and Completegenomics. In addition to the raw sequence data, SRA now also has raw reads information on reference genes.
Depending on the characteristics of the SRA data, the SRA data is divided into four categories:
Studies--research topics
Experiments--experimental design
Runs--sequencing result set
Samples--Sample information
The hierarchical relationships of data structures in SRA are: Studies->experiments->samples->runs.
Studies is for experimental purposes, a study may contain multiple experiment.
Experiments contains sample, DNA source, sequencing platform, data processing and other information.
A experiment may contain one or more runs.
The Runs represents the reads generated by the sequencing instrument operation.
The SRA database is distinguished by a different prefix:
Use:
Search for diseases related to the study, select the appropriate data set
Click on the first case to enter the details screen
Study more Information page
Experiments more Information page
Runs Details page, select the runs you want to download
3. Download data
To download the SRA data, we need to install the SRA Toolkit software package First:
Https://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
Download the appropriate package according to your environment.
Mainly include:
CentOS 32/64
Ubuntu 32/64
MacOS 32/64
MS Windows 32/64
Take CentOS for example:
1. Download and install:
wget "Http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"
Tar xzf sratoolkit.current-centos_linux64.tar.gz
2. Run the download
CD Sratoolkit.2.5.7-centos_linux64/bin
./prefetch SRR2172038
When the download is complete, a NCBI folder is generated in your working home directory.
CD Ncbi/public/sra
View Srr2172038.sra Data
3. Conversion FASTQ
/sratoolkit.2.5.7-centos_linux64/bin/fastq-dump./srr2172038.sra
4. Conversion Fasta
/sratoolkit.2.5.7-centos_linux64/bin/fastq-dump--fasta./srr2172038.sra
Bulk download of SRA data
1. New file, command for prefetch_bash.sh (feel the name simple rough AH)
VI prefetch_bash.sh
#!/bin/bash
For ID in $ (SEQ 1 5) #记住该语法
Do
Prefetch Srr35899${id}
Done
3. Give the file an executable permission
chmod +755 frefetch_bash.sh
4. Add an environment variable or move it to/usr/bin
To add an environment variable:
VI ~/.BASHRC
Export Path=/home/lmt/biosoft/data: $PATH
Source ~/.BASHRC required after saving
or move
move./frefetch_bash.sh/usr/bin so you can use it directly
5. Use prefetch_bash.sh to download the required SRR files in bulk
In Terminal input: prefetch_bash.sh
The downloaded SRR data is stored by default in:/home/lmt/ncbi/public/sra
NCBI SRA Database Usage