All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
input:s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
All DNA is composed of a series of nucleotides, abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is helpful to identify sub-sequences in DNA. Write a function to find the 10-letter length of the subsequence that has occurred multiple times.
Solution 1:hash Table + hash Set
Solution 2:hash Set
Solution 3:hash table + bit Manipulte
Java:
Public list<string> findrepeateddnasequences (String s) { set<string> result = new HashSet (); if (S ==null | | s.length () <2) return new ArrayList (); set<string> temp = new HashSet (); for (int i=0; I<s.length ()-9; i++) { String x = s.substring (i,i+10); if (Temp.contains (x)) { result.add (x); } else temp.add (x); } return new ArrayList (result);
Java:
Public list<string> findrepeateddnasequences (String s) { Set seen = new HashSet (), repeated = new HashSet ();
for (int i = 0; i + 9 < s.length (); i++) { String ten = s.substring (i, i +); if (!seen.add (Ten)) Repeated.add (ten); } return new ArrayList (repeated);}
Java:hashmap + bits Manipulation
Public list<string> findrepeateddnasequences (String s) { set<integer> words = new hashset<> (); set<integer> doublewords = new hashset<> (); List<string> RV = new arraylist<> (); char[] map = new CHAR[26]; Map[' A '-' a '] = 0; map[' C '-' A '] = 1; map[' G '-' A '] = 2; map[' T '-' A '] = 3; for (int i = 0; i < s.length ()-9; i++) { int v = 0; for (int j = i; j < i + ten; j + +) { v <<= 2; V |= Map[s.charat (j)-' A ']; } if (!words.add (v) && Doublewords.add (v)) { Rv.add (s.substring (i, i +));} } return RV;}
Python:
Class solution (Object): def findrepeateddnasequences (self, s): "" " : Type s:str : rtype:list[str] "" " dict, Rolling_hash, res = {}, 0, [] for i in Xrange (Len (s)): Rolling_hash = ((Rolling_hash << 3 ) & 0X3FFFFFFF) | (Ord (S[i]) & 7) If Rolling_hash not in Dict: dict[rolling_hash] = True elif Dict[rolling_hash]: res.append (s[i-9: i + 1]) C12/>dict[rolling_hash] = False return res
Python:
def findRepeatedDnaSequences2 (self, s): "" " : Type s:str : rtype:list[str]" " l, r = [], []< C19/>if Len (s) < 10:return [] for I in range (len (s)-9): l.extend ([S[i:i + ten]) return [k for K, V in Collections. Counter (L). Items () if v > 1]
C++:
Class Solution {public: vector<string> findrepeateddnasequences (string s) { unordered_set<int> seen; Unordered_set<int> DUP; vector<string> result; Vector<char> m (+); M[' A '-' a '] = 0; m[' C '-' A '] = 1; m[' G '-' A '] = 2; m[' T '-' A '] = 3; for (int i = 0; i + ten <= s.size (); ++i) { string substr = S.substr (i, ten); int v = 0; for (int j = i; j < i + ten; ++j) {//20 bits < A-bit int v <<= 2; V |= m[s[j]-' A ']; } if (Seen.count (v) = = 0) {//not seen Seen.insert (v); } else if (Dup.count (v) = = 0) {//seen but not DUP DUP . Insert (v); Result.push_back (substr); } DUP } return result;} ;
[Leetcode] 187. Repeated DNA sequences for repetitive DNA sequences