Recently Tanger let me see a soybean genome re-sequencing of the article, the call SNP used is SOAP2 and other related software, so want to find their own, write a call SNP process out, also is a try it, write not laughed at
SOAP is the development of a series of software, the full name is the short oligonucleotide analytical package, I understand the Chinese meaning is a small oligonucleotide sequence analysis packet, SOAP has many functions, I am also slowly groping to learn, First of all, I used the two software soapsnp and Soapaligner, respectively, to call SNP and mapping, installation is very simple, according to the version of the system to download the latest version of the software, decompression can be used, it is recommended to add environment variables, or use up trouble. Next use, SOAP2 and gatk one of the different places is: soap a lot of functions are separate, estimate the developer to improve its stability and accuracy, the first need to index the reference genome, which requires the use of a tool inside the Soapaligner 2bwt-builder , which is specifically used to index the reference sequence.
2bwt-builder Ref.fasta
Many files are generated after the run is finished
I don't know why it takes a long time for this method to be indexed, maybe the soybean genome is quite large.
Because my data is downloaded from NCBI, the format is the SRA format, now need to use the tool to become the FASTQ format, which requires the installation of a sratoolkit on the NCBI to handle, decompression can be used, no installation, there is a problem is, I downloaded the window system and then uploaded to the server, so that the name does not have a SRA suffix, so this tool will be error, so you need to write a script or directly with the regular name of all changes, and then use Sratoolkit inside the fastq-dump processing,
Call SNP protocol by SOAP2