All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].
Hide TagsHash Table Bit manipulation C + + Standard Template Library is often easy to forget, this is a hash map to do a large table statistics, but direct unordered_map<string, int > This will burst memory.
classSolution { Public: Vector<string> findrepeateddnasequences (strings) {unordered_map<string,int>MP; intLen = S.length (), Curidx =0; stringCurstr; Vector<string>ret; while(Curidx +Ten<=Len) {Curstr= S.substr (Curidx,Ten); if(Mp.find (CURSTR)! =Mp.end ()) {Ret.push_back (CURSTR); } ElseMp[curstr]=1; Curidx++; } returnret; }};
The processing method is either to change it to unordered_map<int, int, and convert by 4 binary. In addition, you can reduce the memory again through Bitset, and finally need to consider the duplication problem, if the unordered_map can be directly tagged time has been added to the return vector, with Bitset can be stored through the temporary variable set<string>, the final generation Returns the vector.
#include <iostream>#include<string>#include<vector>#include<unordered_map>#include<bitset>#include<Set>using namespacestd;//class Solution {//Public ://vector<string> findrepeateddnasequences (string s) {//Unordered_map<string,int > MP;//int len = s.length (), curidx = 0;//string curstr;//vector<string >ret;//While (Curidx + 10<=len) {//curstr = S.substr (curidx,10);//if (Mp.find (CURSTR)!=mp.end ()) {//Ret.push_back (CURSTR);// }//Else//Mp[curstr] = 1;//Curidx + +;// }//return ret;// }//};classSolution { Public: Vector<string> findrepeateddnasequences (strings) {Bitset<1048576>BST; Bst.reset (); Set<string>ret; intsum=0; for(inti =0;i<Ten; i++) Sum= sum*4+Helpfun (S[i]); BST.Set(sum); for(intI=Ten; I<s.length (); i++) {sum%=262144; Sum= sum*4+Helpfun (S[i]); if(Bst[sum]) Ret.insert (S.SUBSTR (i-9,Ten)); ElseBST.Set(sum); } returnvector<string>(Ret.begin (), Ret.end ()); } intHelpfun (Charc) {Switch(c) { Case 'A':return 0; Case 'C':return 1; Case 'G':return 2; Case 'T':return 3; } }};intMain () {strings="AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"; Solution Sol; Vector<string> ret =Sol.findrepeateddnasequences (s); for(intI=0; I<ret.size (); i++) cout<<ret[i]<<Endl; return 0;}
[Leetcode] Repeated DNA sequences hash map