All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "Aaaaacccccaaaaaccccccaaaaagggttt",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Problem solving idea One:
Implemented directly with HashMap, the Java implementation is as follows:
static public list<string> findrepeateddnasequences (String s) { list<string> list=new arraylist< String> (); Hashmap<string,integer> hm=new hashmap<string,integer> (); for (int i=0;i<=s.length () -10;i++) { if (Hm.containskey (s.substring (i,i+10))) List.add (s.substring (i,i+ (ten)); Else Hm.put (s.substring (i,i+10), 1); } return list; }
Result Memory Limit exceeded
Two ways to solve problems:
Simulation hash, the A, C, G, and T respectively to 0, 1, 2, 3, and then every 10 bits of hashcode, if the hashcode is located in the count of 1 output, Java implementation is as follows:
static int getValue (char ch) {if (ch = = ' A ') return 0;else if (ch = = ' C ') return 1;else if (ch = = ' G ') return 2;elsereturn 3; }static public list<string> findrepeateddnasequences (String s) {list<string> List = new arraylist<string > (); if (S.length () <=) return list;int[] count = new int[(1 <<) -1];int hash = 0;for (int i = 0; i < 9; i++) hash = (hash << 2) | GetValue (S.charat (i)); for (int i = 9; I < s.length (); i++) {hash = (1<<20) -1& ((hash << 2) | GetValue (S. CharAt (i))), if (count[hash]==1) List.add (s.substring (i-9, i + 1)); count[hash]++;} return list;}
Java for Leetcode 187 repeated DNA sequences