Describe:
All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].
Ideas:
1. It is clear that the solution to violence is also a method, although the method is not possible.
2. We first look at the ASCII codes for the letters "A" "C" "G" "T", respectively, 65, 67, 71, 84, binary represented as 1000001, 1000011, 1000111, 1010100. You can see that the latter three bits are different, so use the latter three bits to differentiate between the four letters. A letter with 3bit to distinguish, then 10 letters with 30bit is enough. This 0~9 character is represented by the 29th to No. 0 decimal table of int, and then the 30bit is converted to int as the key of the substring and placed in the Hashtable to determine whether the substring has occurred.
Code:
Public list<string> findrepeateddnasequences (String s) {list<string>list=new arraylist<string> (); int Strlen=s.length (); if (strlen<=10) return list; Hashmap<integer, Integer>map=new hashmap<integer,integer> (); int key=0;for (int i=0;i<strLen;i++) {key = ((key<<3) | (S.charat (i) &0x7)) &0x3fffffff;//k<<3,key left 3 bits, that is, the leftmost character is removed//s.charat (i) &0x7) get a low 3-bit//& for marking S.charat (i) characters 0x3fffffff Erase key left three bit after the high-level irrelevant bit if (i<9) continue;if (Map.get (key) ==null)//If there is no string represented by the integer, add it into the map map.put (key, 1 else if (Map.get (key) ==1)//If present, indicates that a duplicate string exists and adds it to the result list {List.add (s.substring (i-9,i+1)); Map.put (key, 2);// Prevents duplicate addition of the same string}}return list;}
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
leetcode_repeated DNA Sequences