Repeated DNA sequences
All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].
Problem Solving Ideas:
1. Use map to store the scanned substrings and count them. The time complexity is O (n). The code is as follows:
Class Solution {public: vector<string> findrepeateddnasequences (string s) { map<string, int> Count; vector<string> result; int len = S.length (); for (int i=0; i<len-10; i++) { string str = S.SUBSTR (i, ten); Map<string, Int>::iterator it=count.find (str); if (It!=count.end ()) { count[str]=1; } else{ if (it->second==1) { result.push_back (str); } count[str]++; } } return result;} ;
However, a memory overflow error is reported.
2, for AGCT coded separately, a total of 4 kinds, so only two bits can be encoded. A total of 10 characters, only 20 bits can represent any combination. The int type is 32 bits, so you can store 10 strings with an int type. After each character check, you need to place the highest position at 0. The code is as follows:
Class Solution {public:vector<string> Findrepeateddnasequences (string s) {const int substrlen = 10; const int mask = 0X3FFFF; Map<int, int> count; Map<char, int> Ccode; ccode[' A ']=0; ccode[' C ']=1; ccode[' G ']=2; ccode[' T ']=3; vector<string> result; int len = S.length (); int code=0; if (len>substrlen) {string str = s.substr (0, Substrlen); for (int i=0; i<substrlen; i++) {code <<= 2; Code |= Ccode[str[i]]; } Count[code] = 1; } for (int i=substrlen; i<len; i++) {Code &= mask; Clear the highest bit code <<= 2; Code |= ccode[s[i]]; count[code]++; if (count[code]==2) {result.push_back (S.substr (I-substrlen + 1, substrlen)); }} RETurn result; }};In the Leetcode, I ran the above code need 269MS, see a lot of other people's time, are much faster than me. I wonder if you have a better way to do it.
[Leetcode] Repeated DNA sequences