The positioning operation of a string is usually called the pattern matching of strings, which is one of the most important operations in various processing systems.
The simplest pattern matching algorithm is the backtracking method, that is, the pattern string matches one character of the main string, and when the pattern string does not match the main string, the main string goes back to the next position where the pattern string matches, and the pattern string backtracking to the first position and continues to match. The time complexity of the algorithm is O (m*n), the algorithm is as follows:
Copy Code code as follows:
Simple string pattern matching algorithm, S is the main string, T is the pattern string, that is to find s in the same string with T
int Index (char *s, char *t, int pos)//pos record from which one to start the match can be directly replaced with 0
{
int I=pos, j=0;
while (I <strlen (S) && J <strlen (T))//Ensure that the length of the string is not exceeded
{
if (s[i] = = T[j])
{++i; ++j;} If the same, continue backward comparison
Else
{i = i-j+1; j = 0;} If it's different, backtrack, and look again.
}
if (j = = strlen (T))
Return I-strlen (T); Returns the index of the same starting position as the T string in S if the match succeeds
else return 0; If the match is unsuccessful, return 0
}
O (M*n) The time complexity is a bit large, so people found the KMP algorithm, the core idea is: When the mismatch occurs, the main string does not backtrack, the pattern string back to the "appropriate" position, which position is appropriate, only with the pattern string, so you can first calculate the pattern string in each character, when the mismatch occurs is, Which location should be traced back to. Algorithm overall time complexity O (m+m).
The algorithm is as follows:
Copy Code code as follows:
void GetNext (char* T, int *next)
{
int i=1,j=0;
next[1]=0;
while (I < strlen (T))
{
if (j = = 0 | | T[i] = = T[j])
{
++i; ++j;
Next[i] = j;
}
else J = Next[j];
}
}
int KMP (char* S, char* T, int pos)
{
int i = pos, j = 1;
while (i)
{
if (s[i] = = T[j])
{
+ + i; + + J;
}
Else
j = Next[j];
}
if (J > strlen (T))
return i-t[0];
Else
return 0;
}
The next operation is not optimal, because he does not consider the Aaaaaaaaaaaaaaaaaaab situation, so that there will be a large number of 1, so the algorithm complexity and the original simple algorithm is no different. So change it a little bit:
Copy Code code as follows:
void Getnextex (char *t, int *next)
{
int i=1,j=0; NEXT[1] = 0;
while (I < strlen (T))
{
if (j = = 0 | | T[i] = = T[j])
{
++i; ++j;
if (t[i] = = T[j])
Next[i] = Next[j]; Reduce fallback times
else next[i] = j; Same as the algorithm above NEXT[I]=J
}
else J = Next[j];
}
}