Genome de novo Assembly principle

Source: Internet
Author: User

Falcon Software Assembly process
    1. For error correction, the original sub-reads is overlap
    2. Pre-assembly and error correction
    3. Overlap detection of reads after error correction
    4. Filtration of overlap
    5. Building a diagram from overlap
    6. Building Contigs from diagrams
A few explanations: what is sub-reads? Why error correction? What is the principle of calibration? What do Length_cutoff and length_cutoff_pre mean, and why do you set these two parameters?

Sub-reads is the machine out of the reads after treatment of sub-reads, convenient software processing;

Because the third-generation sequencing is single-molecule sequencing, reading and growing, reads long, error rate is high, the correct rate of individual reads is only 85%, must be corrected.

If the depth of the sequencing is sufficient, then the overlap can be constructed and corrected according to the probability theory.

Cutoff is to throw away the insufficient length of the reads (for example: Throw away the reads below 10K), because reads too short does not make much sense, increase the calculation, correction, can not cut off too much reads, so its cutoff value is small; pre-assembled, Short reads the use of information has been exhausted, can be thrown away, so its cutoff can be set slightly larger, reduce the amount of computation.

Why do I have to overlap after the error correction? What do the parameters-e.96 and-e.70 mean respectively?

Because the error correction after the reads change is very large, must be re-OVERLAP,-E is the consistency parameter, is the meaning of precision, before assembly, because the error rate is high, Can tolerate a lower consistency 0.70; When assembled, the reads has been calibrated so that the consistency is higher and 0.96 is transferred.

Why would you want to filter the overlap?

In order to cut off some unnecessary calculations, reduce the amount of computation, select only the best n overlap to assemble and filter out the repeating sequence.

Build diagram Build Contigs What do you mean, basically?

is according to overlap a one of the reads to connect, from the past to the back, because of the reason for repeating the sequence, will inevitably constitute a diagram (the diagram will have various forms)

Then according to a certain principle, remove some unnecessary edges in the diagram, choose an optimal path, you can form the final contigs we want.

Genome de novo Assembly principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.