NCBI SRA Database Usage

Source: Internet
Author: User
Tags prefetch

Ext.: https://shengxin.ren/article/16

Https://www.cnblogs.com/lmt921108/p/7442699.html

The SRA (Sequence readarchive) database is the raw data used to store the second generation of sequencing, including 454,illumina,solid,iontorrent,helicos and Completegenomics. In addition to the raw sequence data, SRA now also has raw reads information on reference genes.

Depending on the characteristics of the SRA data, the SRA data is divided into four categories:

    • Studies--research topics

    • Experiments--experimental design

    • Runs--sequencing result set

    • Samples--Sample information

The hierarchical relationships of data structures in SRA are: Studies->experiments->samples->runs.

    • Studies is for experimental purposes, a study may contain multiple experiment.

    • Experiments contains sample, DNA source, sequencing platform, data processing and other information.

    • A experiment may contain one or more runs.

    • The Runs represents the reads generated by the sequencing instrument operation.

The SRA database is distinguished by a different prefix:

      • ERP or SRP represents studies;

      • SRS indicates Samples;

      • The SRX represents experiments;

      • SRR represents Runs;

Use:

  Search for diseases related to the study, select the appropriate data set

Click on the first case to enter the details screen

Study more Information page

Experiments more Information page

Runs Details page, select the runs you want to download

3. Download data

To download the SRA data, we need to install the SRA Toolkit software package First:

Https://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

Download the appropriate package according to your environment.

Mainly include:

    • CentOS 32/64

    • Ubuntu 32/64

    • MacOS 32/64

    • MS Windows 32/64

Take CentOS for example:

1. Download and install:

wget "Http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"

Tar xzf sratoolkit.current-centos_linux64.tar.gz

2. Run the download

CD Sratoolkit.2.5.7-centos_linux64/bin

./prefetch SRR2172038

When the download is complete, a NCBI folder is generated in your working home directory.

CD Ncbi/public/sra

View Srr2172038.sra Data

3. Conversion FASTQ

/sratoolkit.2.5.7-centos_linux64/bin/fastq-dump./srr2172038.sra

4. Conversion Fasta

/sratoolkit.2.5.7-centos_linux64/bin/fastq-dump--fasta./srr2172038.sra

Bulk download of SRA data

1. New file, command for prefetch_bash.sh (feel the name simple rough AH)

VI prefetch_bash.sh

#!/bin/bash

For ID in $ (SEQ 1 5) #记住该语法

Do

Prefetch Srr35899${id}

Done

3. Give the file an executable permission

chmod +755 frefetch_bash.sh

4. Add an environment variable or move it to/usr/bin

To add an environment variable:

VI ~/.BASHRC

Export Path=/home/lmt/biosoft/data: $PATH

Source ~/.BASHRC required after saving

or move

move./frefetch_bash.sh/usr/bin so you can use it directly  

5. Use prefetch_bash.sh to download the required SRR files in bulk

In Terminal input: prefetch_bash.sh

The downloaded SRR data is stored by default in:/home/lmt/ncbi/public/sra

NCBI SRA Database Usage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.