[Leetcode] Repeated DNA sequences

Source: Internet
Author: User

Repeated DNA sequences 

All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].

Problem Solving Ideas:

1. Use map to store the scanned substrings and count them. The time complexity is O (n). The code is as follows:

Class Solution {public:    vector<string> findrepeateddnasequences (string s) {        map<string, int> Count;                vector<string> result;                int len = S.length ();        for (int i=0; i<len-10; i++) {            string str = S.SUBSTR (i, ten);            Map<string, Int>::iterator it=count.find (str);            if (It!=count.end ()) {                count[str]=1;            } else{                if (it->second==1) {                    result.push_back (str);                }                count[str]++;            }        }                return result;}    ;
However, a memory overflow error is reported.

2, for AGCT coded separately, a total of 4 kinds, so only two bits can be encoded. A total of 10 characters, only 20 bits can represent any combination. The int type is 32 bits, so you can store 10 strings with an int type. After each character check, you need to place the highest position at 0. The code is as follows:

Class Solution {public:vector<string> Findrepeateddnasequences (string s) {const int substrlen = 10;                const int mask = 0X3FFFF;                Map<int, int> count;        Map<char, int> Ccode;        ccode[' A ']=0;        ccode[' C ']=1;        ccode[' G ']=2;                ccode[' T ']=3;                vector<string> result;                int len = S.length ();                int code=0;            if (len>substrlen) {string str = s.substr (0, Substrlen);                for (int i=0; i<substrlen; i++) {code <<= 2;            Code |= Ccode[str[i]];        } Count[code] = 1;    } for (int i=substrlen; i<len; i++) {Code &= mask;            Clear the highest bit code <<= 2;            Code |= ccode[s[i]];            count[code]++;            if (count[code]==2) {result.push_back (S.substr (I-substrlen + 1, substrlen)); }} RETurn result; }};
In the Leetcode, I ran the above code need 269MS, see a lot of other people's time, are much faster than me. I wonder if you have a better way to do it.


[Leetcode] Repeated DNA sequences

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.