Repeated DNA sequences
All DNA are composed of a series of nucleotides abbreviated as a, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it's sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "aaaaacccccaaaaaccccccaaaaagggttt", return:["AAAAACCCCC", "CCCCCAAAAA"].
Using the bitmap algorithm can reduce memory, the code is as follows:
intmap_exist[1024x768*1024x768/ +];intmap_pattern[1024x768*1024x768/ +];#defineSet (map,x) \(Map[x>>5] |= (1<< (X &0x1F)))#defineTest (map,x) \(Map[x>>5] & (1<< (X &0x1F)))intdnamap[ -];Char* * Findrepeateddnasequences (Char* S,int*returnsize) { *returnsize =0; if(s = = NULL)returnNULL; intLen =strlen (s); if(Len <=Ten)returnNULL; memset (Map_exist,0,sizeof(int)* (1024x768*1024x768/ +)); memset (Map_pattern,0,sizeof(int)* (1024x768*1024x768/ +)); dnamap['A'-'A'] =0; dnamap['C'-'A'] =1; dnamap['G'-'A'] =2; dnamap['T'-'A'] =3; Char* * ret =malloc(sizeof(Char*)); intCurr =0; intSize =1; intkey; inti =0; while(I <9) Key= (Key <<2) | dnamap[s[i++]-'A']; while(I <Len) {Key= (Key <<2) &0xFFFFF) | dnamap[s[i++]-'A']; if(Test (Map_pattern, key)) {if(!Test (Map_exist, key)) { Set(Map_exist, key); if(Curr = =size) {Size*=2; RET=realloc(Ret,sizeof(Char*)*size); } Ret[curr]=malloc(sizeof(Char)* One); memcpy (Ret[curr],&s[i-Ten],Ten); ret[curr][Ten] =' /'; ++Curr; } } Else{ Set(Map_pattern, key); }} RET=realloc(Ret,sizeof(Char*)*Curr); *returnsize =Curr; returnret;}
The algorithm takes around 6ms, very fast
leetcode-repeated DNA sequences (bitmap algorithm reduces memory)