Just started to learn bioinformatics, the teacher gave a SNP as a marker to draw a genetic map of the subject, studied for some time, began to call SNP with Bwa+samtools, the elder sister used this set before, she suggested I use another method to do, so prepare to learn to use GATK to do SNP calling.
Call SNP first to have a more accurate reference genome, then there are samples, my sample is a hybrid produced F2, below make some of their own use of process and experience,
This process is different from BWA and Samtool, the first need to filter the fasta of sample, I found the following mainly with NGSQC Toolkit under the Perl script filter, the command is:
$ illuqc_prll.pl-pe READ1.FQ READS2.FQ 2 5-c 8-o QC
This is the command that added the environment variable.
Then use BOWTIE2 to index the reference genome:
$ bowtie2-build genome.fasta Genome
Genome is the index file prefix, this index file generation directory defaults to the directory where the current directory is located rather than the ref genome
Then mapping the sample to ref genome on the following command:
Bowtie2--rg-id Sample--rg "Pl:illumina"--rg "Sm:samplename"-X genome-1 illuqc_filtered_files/sampler1.fastq.gz_ Filtered-2 illuqc_filtered_files/sampler2.fastq.gz_filtered-p 8-s Sample.sam
--rg-id is the header file that sets the Read group ID to-RG plus "@RG" on the sample, similar to the-R parameter in BWA Mem
The next step is to convert the resulting Sam file into a binary BAM file, which is not available to Samtools, as follows:
$ samtools view-bs Sample.sam > Sample.bam
Samtools view is dedicated to handling BAM files and Sam files, the-bs command is the input file format for the SAM output file format for BAM.
The next step is to mark out a large number of duplicate fragments produced by PCR amplification:
This step was used in the Sortsam.jar of Picardhome.
The command is as follows:
$ java-xmx10g-jar Picardhome/sortsam.jar Input=sample.bam Output=sample.sorted.bam SORT_ORDER=coordinate
Sort
$ java-xmx10g-jar Picardhome/marduplicates.jar Input=sample.sorted.bam Output=sample.dd.bam METRICS_FILE= Sample.dd.metrics max_file_handles_for_read_ends_map=1000
Index to Genome.fasta again
$ samtools Faidx Genome.fasta
$ Java-xmx10g-jar Picardhome/createsequencedictionary.jar R=genome.fasta o=genome.dict
$ samtools Index Sample.dd.bam
Here is a need to note that the Genome.fasta prefix genome must be the same as the genome.dict prefix genome, no one next error
Next, compare the Indel (realignment) around the area:
Started gatk the software with a knife.
Java-xmx10g-jar genomeanalysistk.jar-r genome.fasta-t realignertargetcreator-i sample.dd.bam-o sample.realn.int Ervals
Java-xmx10g-jar $GATKHome/genomeanalysistk.jar \ r genome.fasta-t indelrealigner \-targetintervals sample.realn.inte rvals \ I sample.dd.bam-o sample.realn.bam
Next is the first call SNP:
$ java-xmx10g-jar $GATKHome/genomeanalysistk.jar \ r genome.fasta-t unifiedgenotyper \ I sample.realn.bam-o sample.g ATK.RAW1.VCF \--read_filter badcigar-glm BOTH \-stand_call_conf 30.0-stand_emit_conf 0
$ samtools mpileup-q 20-q 25-ug-t ad-t dp-f genome.fasta Sample.realn.bam | Bcftool Call-c-V indels-v-> SAMPLE.SAMTOOLS.RAW1.VCF
Combined with GATK and Samtools and Picardtools call SNP