Introduction to some software functions in NGS

Last Update:2018-10-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Bowtie

Short Sequence comparison tool and blast are also short sequence comparison tools, which are fast and easy to understand.

The input can be a fastq or FASTA file.

Generate the Sam format of the comparison result file.

2. BWA

From: https://www.jianshu.com/p/1552cc6ac3be

Software that compares DNA sequences to the reference genome contains three algorithms:

Bwa-Backtrack: Suitable for matching sequences with a length not greater than BP;

BWA-SW: A sequence with a length of 70-1 m bp;

BWA-MEM: Combined with a 70-1 m bp sequence and high-quality Sequencing data, it is faster and more accurate.

Use whereis BWA to find its installation path:

[email protected]:/data1/zzl$ whereis bwabwa: /usr/bin/bwa /usr/share/bwa /usr/share/man/man1/bwa.1.gz

Enter BWA to get the following help:

Usage:   bwa <command> [options]Command: index         index sequences in the FASTA format         mem           BWA-MEM algorithm         fastmap       identify super-maximal exact matches         pemerge       merge overlapping paired ends (EXPERIMENTAL)         aln           gapped/ungapped alignment         samse         generate alignment (single ended)         sampe         generate alignment (paired ended)         bwasw         BWA-SW for long queries         shm           manage indices in shared memory         fa2pac        convert FASTA to PAC format         pac2bwt       generate BWT from PAC         pac2bwtgen    alternative algorithm for generating BWT         bwtupdate     update .bwt to the new format         bwt2sa        generate SA from BWT and OccNote: To use BWA, you need to first index the genome with `bwa index‘.      There are three alignment algorithms in BWA: `mem‘, `bwasw‘, and      `aln/samse/sampe‘. If you are not sure which to use, try `bwa mem‘      first. Please `man ./bwa.1‘ for the manual.

Steps:

1. index reference genome:

bwa index –a bwtsw hg19.fasta

Here we use the bwtsw algorithm to build the index and the final output result file:

Files of the following types are generated: BWT, PAC, Ann, AMB, and SA:

[email protected]:/data1/GRCm38$ lsGRCm38_68.fa  GRCm38_68.fa.amb  GRCm38_68.fa.ann  GRCm38_68.fa.bwt  GRCm38_68.fa.fai  GRCm38_68.fa.pac  GRCm38_68.fa.sa

2. Use the BWA-MEM Algorithm for comparison:

bwa mem –t 4 hg19.fasta read1.fq read2.fq > aln-pe.sam

I used this command:

bwa mem -t 4 ../hg19/hg19.fasta ERR580012_1.fastq.gz ERR580012_2.fastq.gz > aln-pe.sam

The mem algorithm is used.-T is used to select several threads, increase threads, and reduce the running time. Then, the FASTA file of the reference genome is used. And other parameters:

-pIgnore the second input sequence. By default, the input sequence file is considered as single-ended sequencing, and the input two sequence files are double-ended sequencing. After this parameter is added, the second input sequence file will be ignored, and the first file will be compared as single-ended Sequencing data;

Save the final result to the Sam file.

So what isSingle-ended and double-ended sequencing:

From: https://www.cnblogs.com/Formulate0303/p/7843082.html

1. Single-ended sequencing (single-EAD) First, the DNA sample is segmented to form a-p segment, the primer sequence is connected to one end of the DNA segment, and then the end is added with a connector, the fragments are fixed on flowcell to generate a DNA cluster, and the single-ended read sequence is sequencing on the machine.

2. The paied-end method adds a sequencing primer binding site to both ends of the constructed DNA library to be tested. After the first round of sequencing is complete, the template chain of the first round of sequencing is removed, we use the read-sequencing module (paied-end module) to guide the regeneration and amplification of the complementary chain in the original position, so as to achieve the amount of template used for the second round of sequencing, and then perform the synthesis and sequencing of the second round of the complementary chain.

// In fact, this second point is not quite clear. [1]

3. compress the Sam file into the BAM format

samtools view –bS aln-pe_reorder.sam –o aln-pe.bam

Search for samtools help:

Usage:   samtools <command> [options]Command: view        SAM<->BAM conversion         sort        sort alignment file         mpileup     multi-way pileup         depth       compute the depth         faidx       index/extract FASTA         tview       text alignment viewer         index       index alignment         idxstats    BAM index stats (r595 or later)         fixmate     fix mate information         flagstat    simple stats         calmd       recalculate MD/NM tags and ‘=‘ bases         merge       merge sorted alignments         rmdup       remove PCR duplicates         reheader    replace BAM header         cat         concatenate BAMs         bedcov      read depth per BED region         targetcut   cut fosmid regions (for fosmid pool only)         phase       phase heterozygotes         bamshuf     shuffle and group alignments by name

-B indicates that the output file is in the BAM file format.-s indicates that the input file is a BAM file by default. If the input file is a SAM file, you 'd better add this parameter; otherwise, an error is reported. -O output file name

FinallyBamFile, where B refers to binary, which is fast in operation.

Run the following command to view the file header:

samtools view -H ESCell#8.sam

Introduction to some software functions in NGS

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to some software functions in NGS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to some software functions in NGS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support