Project data:
- Kongyu_131_pcrfree_. CCAAT_L006_R1_001.fastq.gz (100X) (19G)
Kongyu_131_pcrfree_. CCAAT_L006_R2_001.fastq.gz (100X) (20G)
- Y255_pcrfree_. TCCGC_L005_R1_001.fastq.gz (30X) (5.4G)
Y255_pcrfree_. TCCGC_L005_R2_001.fastq.gz (30X) (6.0G)
- All.chrs.con.fasta (364M)
Tools:
Strategy:
- The sequenced second-generation reads used BWA to the reference genome, divided into different windows, partially assembled by the window, and then merged.
Pre-Knowledge:
- Ability to write scripts using Perl and Shell skillfully
- will be proficient in using PBS to submit tasks
- Bwa How to use
- IGV How to use
- Soapdenovo How to use
Problems with local assembly:
There are already two groups of people did not come out, the local assembly is mostly impossible to assemble a complete 100K window, because the second generation sequence reads too short, repeat the sequence too much, repeat the sequence will cause the connection is interrupted, a window will appear a lot of fragments, and there is no way to continue to connect them, so they are halfway.
In the following situations, it is necessary to connect many fragments into a complete sequence by means of later analysis.
To Fat's article, completely in the absence of a reference genome, Denovo assembled, using a variety of means, to assemble fragmented sequences into complete genomes.
The boss does not know much, the biggest contribution is to urge.
Project one: Genome assembly using two-generation data (local assembly and global Assembly)