Asma sequencing technology learning to improve the ing speed

Source: Internet
Author: User
 

 

Recently I read a paper about NGS (next generation sequencing) and ASAM. The name is AMAs: optimizing the partition and filtration of adaptive seeds to speed up read mapping. This paper introduces a sequencing technology, AMAs, which can increase the ing speed. In general, the ing speed, memory usage, and stability of this technology are optimal. The core of ASAM sequencing is the same adaptive seed as gem (the genome multitool. Later, we introduced the comparison with other mainstream sequencing algorithms. Finally, it shows the effects of its own algorithms under different parameters.

This article is just a simple introduction. If you want to study in depth, study the original article carefully.

Features of Asma:

Read ing is a key task in NGS data analysis. Currently, the mainstream read mappers include soap 2, mrsfast, mrfast, razers 3, Gem, Masai, Hobbes 1, and 2. Asma differs from other technologies in the following aspects:

1. when the matching number of the current seed falls below the predefined frequency threshold "F", add it to the candidate space and start a new seed until the desired seed is obtained or the reading ends. For the entire simulated dataset read at 6.61 K, the total number of candidate positions is reduced by times.

2. AMAs pre-computes an adaptive seed for each reference genome at the indexing step and stores its information in the base tree for each future ing task

3. The last seed in the read adaptive partition may be too short, and its frequency may exceed the frequency of other seed and the threshold "F ". Most of these locations are false positives, but removing them may lead to loss of ing sensitivity. Asma only filters out the seeds with matching numbers much higher than expected frequencies. This filtering step only affects a small amount of reading. However, from the results, the total candidate space is significantly reduced by more than 50%.

4. asma stores candidate positions in the binary search tree, allowing them to be quickly sorted and identified for multiple seed reports. During the ing step, avoid repeated adaptive seed calculation to reduce the ing time.

Comparison with other technologies:

Table 1 is 100 K and the length is BP. The additional constraints of the maximum seed length introduced in AMAs ensure full sensitivity (100%) for all ing locations ). GEM is superior to other mappers in terms of running time and sensitivity, demonstrating the advantages of adaptive seeds.

Table 2 runs the er on a real dataset of the 1000 genome program. asma is excellent in single-thread and eight-thread modes. This clearly shows the benefits of optimizing the partitioning and filtering methods implemented in AMAs.

Process large-scale Datasets:

AMAs is superior to GEM and mrsfast in terms of runtime, ing rate and sensitivity in processing large-scale datasets. Memory usage AMAs and GEM are relatively higher than other algorithms, which is also a place to be improved after adaptive seed. In 8-thread processing of large-scale data sets, Gem sometimes encounters errors, and memory usage increases significantly or even crashes.

AMAs uses the "FSE" parameter to control the minimum number of seeds. If this parameter is set too high, the advantages of adaptive seeds will be restrained, and a large number of candidates will be generated, resulting in a longer running time. When "FSE = 2" is set, AMAs takes the least ing time, so "FSE = 2" is set to the default value of ASAM.

"F" is the frequency threshold of adaptive seed. Reducing "F" reduces the number of candidate hits, thus reducing the ing time. However, a low "F" also causes the seed length to be too long and the seed size of each partition is too small, which reduces the ing sensitivity. In addition, the "f" reduction also expands the Index Tree of the adaptive seed, resulting in more memory consumption.

Summary:

First, AMAs calculates all possible adaptive seeds in advance in the index step, in order to avoid repeated unnecessary adaptive seed computing in the ing step, precise filtering based on the final seed and extra seed of High Repetition further substantially tighten the candidate space, effectively reducing the time required for seed expansion. AMAs allows you to control the minimum number of adaptive seeds for each read partition to achieve an ideal balance between running time and sensitivity. In a multi-threaded environment, AMAs can process data well and maintain stable memory usage regardless of the data size.

However, the advantages of adaptive seeds are not fully explored. Currently, GEM is the only full er that uses adaptive seeds. This technology still needs to be studied.

 

Asma sequencing technology learning to improve the ing speed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.