Objective:
KMP algorithm is a kind of string matching algorithm, which is discovered by Knuth,morris and Pratt simultaneously (called KMP algorithm). The key of KMP algorithm is to reduce the number of matches between the pattern string and the main string so as to achieve fast matching by using the information after the match failure. The popular practice is to implement a next () function, which itself contains local matching information for the pattern string. Because the next function is not easy to understand, this article is also based on the space-time approach, but will be implemented in another code, the hope is more convenient for the reader to understand!
Test data
Aseeesatba esatas330kdwejjl_8 jjl_faw4etoesting TIOAABACB Abac
Test results
49-10
(Note: If the match returns the start index of the text substring; otherwise returns-1)
1. The realization of a violent search
1 //Violent substring look up one type: O (m*n)2 Private Static intsearch0 (string text, String pat) {3 intI, J, N = Text.length (), M =pat.length ();4 for(i = 0; I <= n-m; i++) {5 for(j = 0; J < M; J + +) {6 if(Text.charat (i + j)! =Pat.charat (j))7 Break;8 }9 if(M = =j)Ten returni; One } A return-1; -}
The function passes in the text and pattern string Pat, where I and i+j mark the end and end of the text substring, respectively. If text has substring matching pat, it returns the text substring starting with index; 1; time complexity: O (M*n)
2. Brute force search implementation two
1 //Violent substring Lookup two-type: O (m*n)2 Public Static intSearch (string text, String pat) {3 intI, J;4 intN = Text.length (), M =pat.length ();5 for(i = 0, j = 0; I < N && J < M; i++) {6 if(Text.charat (i) = =Pat.charat (j))7J + +;8 Else {9I-=J;Tenj = 0; One } A } - return(j = M)? (i-m): 1; -}
The same brute force search algorithm is judged by the "I" in the continuous backtracking text string. If text has substring matching pat, it returns the text substring starting with index; 1; time complexity: O (M*n)
3.KMP Algorithm (space change time)
To optimize the algorithm's time complexity, we tried to store some information and introduced additional space storage dfa[][].
From the second type of brute force search algorithm described above, we can be inspired. That is, by recording "J" to ensure that "I" can only move to the right, no need to go back to the left. Among them, Dfa[i][j]
Represents the current character ' charAt (i) ' in the text string, where the next text character ' CharAt (i+1) ' should match the pattern string (0~j).
Here we introduce the numerical initialization of the finite automaton DFA to dfa[][]. Take the pattern string "AABACB" as an example to match Pat's DFA status graph as follows:
The corresponding code is as follows:
1 //Construction dfa[][]2Dfa[pat.charat (0)][0] = 1;3 for(intx=0,j=0;j<m;j++){4 for(intC=0;c<r;c++){5DFA[C][J] =Dfa[c][x];6 }7Dfa[pat.charat (j)][j] = j+1;8X =Dfa[pat.charat (j)][x];9}
Where "X" represents a different DFA state, the time complexity of the code constructs dfa[][] is: O (n*r);
------------------------------------------------
Java full code
1 Packagech05.string.substring;2 3 ImportJava.io.File;4 ImportJava.util.Scanner;5 6 Public classKMP {7 8 Private intR = 255;9 PrivateString Pat;Ten Private int[] [] DFA; One A PublicKMP (String Pat) { - This. Pat =Pat; - intM =pat.length (); theDFA =New int[r][m]; - - //Construction dfa[][] -Dfa[pat.charat (0)][0] = 1; + for(intx=0,j=0;j<m;j++){ - for(intC=0;c<r;c++){ +DFA[C][J] =Dfa[c][x]; A } atDfa[pat.charat (j)][j] = j+1; -X =Dfa[pat.charat (j)][x]; - } - - } - in Public intSearch (String text) { - inti,j; to intN = Text.length (), M =pat.length (); + for(I=0,j=0;i<n && j<m; i++){ -j =Dfa[text.charat (i)][j]; the } * returnJ==m? (i-m): 1; $ }Panax Notoginseng - Public Static voidMain (string[] args)throwsException { the //reading data from a file +Scanner input =NewScanner (NewFile ("Datain.txt")); A while(Input.hasnext ()) { theString Text =Input.next (); +KMP KMP =NewKMP (Input.next ()); - intAns =kmp.search (text); $ //Output Answer $ System.out.println (ans); - } - } the}
------------------------------------------------
Complete code for C + +
1#include <cstdio>2#include <cstring>3#include <iostream>4#include <string>5 using namespacestd;6 Const intmaxn=1e4+Ten;7 Const intR= the;8 intDFA[R][MAXN];9 Ten stringText,pat; One voidinit () { A intm=pat.length (); -dfa[pat[0]][0] =1; - for(intx=0, j=1; j<m;j++){ the /** Copy directly from dfa[][x] to Dfa[][j]*/ - for(intC=0; c<r;c++){ -DFA[C][J] =Dfa[c][x]; - } + /** Match to, continue to the right.*/ -DFA[PAT[J]][J] = j+1; +X =Dfa[pat[j]][x]; A } at - } - intSearch1 () { - init (); - intI,j,n = Text.length (), M =pat.length (); - for(i=0, j=0; I<n && j<m;i++){ inj =Dfa[text[i]][j]; - } to returnJ==m? (i-m):-1; + } - intMain () { theFreopen ("Datain.txt","R", stdin); * while(cin>>text>>Pat) { $Cout<<search1 () <<Endl;Panax Notoginseng } - return 0; the}
Reference:
"1" Algorithms (4th)-She Luyun
"2" Http://baike.baidu.com/link?url=_WLufLz1lw2e4eMgU6DI8IblUkp838Qf595Nqxfg2JN3aqNED2FFe3U6J9yPmUv_zKfFqAAQJid7Gzho3ork8K
Classic KMP algorithm C + + and Java implementation code