Description
It's well known that DNA Sequence are a Sequence only contains a, C, T and G, and it's very useful to analyze a segment of DNA Sequence,for example, if a animal ' s DNA Sequence contains segment ATC then it could mean that the animal could have a gene Tic disease. Until now scientists has found several those segments, the problem is how many kinds of DNA sequences of a species don ' t contain those segments.
Suppose that DNA sequences of a species are a sequence that consist of a, C, T and g,and the length of sequences are a given Integer n.
Input
First line contains (0 <= m <=), N (1 <= n <=2000000000). Here, M are the number of genetic disease segment, and n is the length of sequences.
Next m lines each line contain a DNA genetic disease segment, and length of these segments are not larger than 10.
Output
An integer, the number of DNA sequences, mod 100000.
Sample Input
4 3ATACAGAA
Sample Output
36
The main idea: given m strings, how many kinds of string length is n and does not contain this m string. (The string consists of a, C, G, T, M <= 10,n <= 2000000000, each substring length of not more than 10) analysis: The number of strings that do not contain substrings is required, it is obvious that AC automata + dynamic Programming (O (nm^2)) first, read into the substring Constructs a Trie tree, then builds the failure pointer (BFS) on the Trie tree, then runs on the Trie tree The Dynamic Plan, F[i][j] represents the length is I, the last character corresponds to the Trie number the program number of the J node, should notice that the leaf node cannot pass, The sum of the scheme numbers of all the nodes that are finally drawn is the answer that is asked. Matrix multiplication Fast Power (O (m^2 * logn)) because N is very large, so dynamic programming will certainly be T, but we study the data range will find that the degree of M and the string is very small, in fact, according to the worst case, the Trie number is as long as the array open to 100 is enough. Just 100? Even if 100 of the square is saved, 100 is too small! Have to make good use of this little 100. 100 nodes, equivalent to 100 states, 100 transitions between states, you can enumerate the 4 nodes that each node points to, so that each time the transfer is as long as O (4). But if you use a matrix to keep the connectivity between 22 (that is, whether it can be transferred), it takes O (100) time to transfer, so significantly slower, what is the use? Since it will certainly be useful to say, the original transfer equation is F[i + 1][trie[j]. TO[K]] + = F[i][j] (k is 0~3), but if so the transfer equation becomes F[i + 1][k] + = f[i][j] * Mat[j][k] (k for 0~100,mat indicates whether to connect). If you look at it in a different way, you can become f[i][j] = SUM (f[i-1][k] * mat[k][j]), then it becomes a recursive type, and it is a recursive method that can be solved by matrix multiplication quickly. The rest of the work is simple, construct the mat matrix, do a quick power, and then multiply the last matrix on the good.
Code:
1#include <cstdio>2#include <cstring>3 structMatrix {4 inta[ the][ the];5 } mat, ti;6 intN, M, Len, Last, TN, ans, f[ the], t[ the][4], v[ the];7 Charstr[ One];8 voidBFS ()9 {Ten intq[ the], HD, tl; One for(Q[HD = TL =0] =0; HD <= tl; hd++) A for(inti =0; I <4; i++) -T[q[hd]][i]? (Q[HD]? F[t[q[hd]][i]] = T[f[q[hd]]][i]:0), V[t[q[hd]][i]] |= V[f[t[q[hd]][i]], q[++tl] = T[q[hd]][i]: t[q[hd]][i] =T[f[q[hd]]][i]; - } the inline matrix times (matrix M1, matrix m2) - { - Matrix ret; -memset (RET.A,0,sizeof(RET.A)); + for(inti =0; I <= TN; i++) - for(intj =0; J <= TN; J + +) + for(intK =0; K <= tn; RET.A[I][J]%=100000, k++) ARET.A[I][J] + = (Long Long) m1.a[i][k] * M2.a[k][j]%100000; at returnret; - } -InlineintGC (Charch) - { - returnch = ='A'?0: ch = ='C'?1: ch = ='T'?2:3; - } in intMain () - { toscanf ("%d%d", &n, &m); + for(inti =0; I < n; i++) - { thescanf ("%s", (Char*) &str); *Len =strlen (str); $Last =0;Panax Notoginseng for(intj =0, ch = GC (Str[j]); J < Len; CH = GC (str[++j])) -last = T[last][ch]? T[LAST][CH]: t[last][ch] = + +tn; theV[last] =1; + } A BFS (); the for(inti =0; I <= TN; i++) + for(intj =0; J <4; J + +) -(!v[t[i][j]]) &&!v[i]? mat.a[i][t[i][j]]++:0; $ for(inti =0; I <= TN; i++) Ti.a[i][i] =1; $ for(; m; m >>=1) -(M &1? Ti = Times (Ti, mat): TI), Mat =Times (Mat, mat); - for(inti =0; I <= TN; i++) ans + = ti.a[0][i]; theprintf ("%d", ans%100000); -}
"POJ 2778" DNA Sequence