Topic:
All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].
Ideas:
This question is going to use HashMap to put all the sub-sequences in, find the same. Isn't it a little naïve?
Check, you can use a bit mask to do, suddenly dawned. Bit manipulation can always make the code very concise.
Look directly at the code. There is no clever place, just a letter with two bits.
a:00
C:01
G:10
T:11
1 PublicList<string>findrepeateddnasequences (String s) {2list<string> res =NewArraylist<string>();3 if(S.length () <= 10) {4 returnRes;5 }6Hashmap<character, integer> map =NewHashmap<character, integer>();7Map.put (' A ', 0);8Map.put (' C ', 1);9Map.put (' G ', 2);TenMap.put (' T ', 3); One A intMask = 0; - -Set<integer> Subseq =NewHashset<integer>(); theSet<integer> Addedsubseq =NewHashset<integer>(); - - for(inti = 0; I < s.length (); i++) { - if(I < 9) { +Mask = Mask << 2; -Mask + =Map.get (S.charat (i)); +}Else{ AMask = Mask << 2; atMask + =Map.get (S.charat (i)); - - //We only need -Mask = Mask << 12; -Mask = Mask >>> 12; - if(Subseq.contains (mask) &&!Addedsubseq.contains (Mask)) { inRes.add (s.substring (i-9, i + 1)); - Addedsubseq.add (mask); to}Else { + Subseq.add (mask); - } the } * } $ returnRes;Panax Notoginseng}
Repeated DNA sequences