Repeated DNA sequences
Problem:
All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Ideas:
Bit operations get hash
My Code:
Public classSolution { PublicList<string>findrepeateddnasequences (String s) {List<String> rst =NewArraylist<string>(); if(s = =NULL|| S.length () < 10)returnrst; intLen =s.length (); Set<Integer> set =NewHashset<integer>(); for(inti = 0; I <= len-10; i++) {String substr= S.substring (i,i+10); Integer Key=Gethash (SUBSTR); if(Set.contains (key)) {if(!rst.contains (substr)) Rst.add (SUBSTR); } Else{set.add (key); } } returnrst; } Public intGetCode (Charc) {Switch(c) { Case' A ':return0; Case' C ':return1; Case' G ':return2; default:return3; } } PublicInteger Gethash (String s) {inthash = 0; for(inti = 0; I < s.length (); i++) {Hash= Hash << 2 |GetCode (S.charat (i)); } returnHash; }}
View Code
The Learning Place:
- Usually write code when the bit operation is seldom used, so every time you see bit operation problem, will be very unfamiliar, need to strengthen this aspect of the content
- Common bit operations
- With the operation can get a number of each bit,n& (0x00000001<<i)
- Use or manipulate to build a new number, reverse build: rst |= (0x80000000 >>> i) forward build: hash = hash << 2 | GetCode (S.charat (i));
Repeated DNA sequences