All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].
Using the map Word super memory, instead of Bitsmap, because only 4 letters, so as long as two bits can be used as a letter encoding, 10 letters is 20 bits, so create an array of 2^20 size can solve the problem.
1 classSolution {2 Public:3 intGetval (Charch) {4 if(ch = ='A')return 0;5 if(ch = ='C')return 1;6 if(ch = ='G')return 2;7 if(ch = ='T')return 3;8 }9 Tenvector<string> findrepeateddnasequences (strings) { One Set<string>St; Avector<string>Res; - stringstr; - if(S.length () <Ten|| s = ="")returnRes; the intmp[1024x768*1024x768] = {0}; -Unsignedintval =0; - for(inti =0; I <9; ++i) { -Val <<=2; +Val |=getval (S[i]); - } + for(inti =9; I < s.length (); ++i) { AVal <<= -; atVal >>= A; -Val |=getval (S[i]); -++Mp[val]; - if(Mp[val] >1) { -str = S.SUBSTR (i-9,Ten); - St.insert (str); in } - } to for(Set<string>::iterator i = St.begin (); I! = St.end (); ++i) { +Res.push_back (*i); - } the returnRes; * } $};
[Leetcode] repeated DNA sequences