Combined with GATK and Samtools and Picardtools call SNP

Last Update:2016-11-14 Source: Internet

Author: User

Tags perl script

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Just started to learn bioinformatics, the teacher gave a SNP as a marker to draw a genetic map of the subject, studied for some time, began to call SNP with Bwa+samtools, the elder sister used this set before, she suggested I use another method to do, so prepare to learn to use GATK to do SNP calling.

Call SNP first to have a more accurate reference genome, then there are samples, my sample is a hybrid produced F2, below make some of their own use of process and experience,

This process is different from BWA and Samtool, the first need to filter the fasta of sample, I found the following mainly with NGSQC Toolkit under the Perl script filter, the command is:

$ illuqc_prll.pl-pe READ1.FQ READS2.FQ 2 5-c 8-o QC

This is the command that added the environment variable.

Then use BOWTIE2 to index the reference genome:

$ bowtie2-build genome.fasta Genome

Genome is the index file prefix, this index file generation directory defaults to the directory where the current directory is located rather than the ref genome

Then mapping the sample to ref genome on the following command:

Bowtie2--rg-id Sample--rg "Pl:illumina"--rg "Sm:samplename"-X genome-1 illuqc_filtered_files/sampler1.fastq.gz_ Filtered-2 illuqc_filtered_files/sampler2.fastq.gz_filtered-p 8-s Sample.sam

--rg-id is the header file that sets the Read group ID to-RG plus "@RG" on the sample, similar to the-R parameter in BWA Mem

The next step is to convert the resulting Sam file into a binary BAM file, which is not available to Samtools, as follows:

$ samtools view-bs Sample.sam > Sample.bam

Samtools view is dedicated to handling BAM files and Sam files, the-bs command is the input file format for the SAM output file format for BAM.

The next step is to mark out a large number of duplicate fragments produced by PCR amplification:

This step was used in the Sortsam.jar of Picardhome.

The command is as follows:

$ java-xmx10g-jar Picardhome/sortsam.jar Input=sample.bam Output=sample.sorted.bam SORT_ORDER=coordinate

Sort

$ java-xmx10g-jar Picardhome/marduplicates.jar Input=sample.sorted.bam Output=sample.dd.bam METRICS_FILE= Sample.dd.metrics max_file_handles_for_read_ends_map=1000

Index to Genome.fasta again

$ samtools Faidx Genome.fasta

$ Java-xmx10g-jar Picardhome/createsequencedictionary.jar R=genome.fasta o=genome.dict

$ samtools Index Sample.dd.bam

Here is a need to note that the Genome.fasta prefix genome must be the same as the genome.dict prefix genome, no one next error

Next, compare the Indel (realignment) around the area:

Started gatk the software with a knife.

Java-xmx10g-jar genomeanalysistk.jar-r genome.fasta-t realignertargetcreator-i sample.dd.bam-o sample.realn.int Ervals

Java-xmx10g-jar $GATKHome/genomeanalysistk.jar \ r genome.fasta-t indelrealigner \-targetintervals sample.realn.inte rvals \ I sample.dd.bam-o sample.realn.bam

Next is the first call SNP:

$ java-xmx10g-jar $GATKHome/genomeanalysistk.jar \ r genome.fasta-t unifiedgenotyper \ I sample.realn.bam-o sample.g ATK.RAW1.VCF \--read_filter badcigar-glm BOTH \-stand_call_conf 30.0-stand_emit_conf 0

$ samtools mpileup-q 20-q 25-ug-t ad-t dp-f genome.fasta Sample.realn.bam | Bcftool Call-c-V indels-v-> SAMPLE.SAMTOOLS.RAW1.VCF

Combined with GATK and Samtools and Picardtools call SNP

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More