Definition of a string
String is a finite sequence of 0 or more characters, also called a string
The number of characters in the string n is called the length of the string
A string of 0 characters is called an empty string.
Abstract data types for strings
Sequential storage structure of strings
String Me chained storage structure
A node can store a character or consider storing multiple characters, and if the last node is not full, it can be filled with # or other non-string value characters.
A simple pattern matching algorithm
Starts each character of the main string as a substring and matches the character to match. To cycle through the main string, each character starts with a small loop of the length of the t until the match succeeds or all the traversal is complete.
Time Complexity of O (n+m)
/*returns the position of the substring T after the first POS character in the main string s. If it does not exist, the function returns a value of 0. *//*where, t non-empty, 1≤pos≤strlength (S). */intIndex (String S, String T,intPOS) { inti = pos;/*I is used for the current position subscript value in the main string s, and if POS is not 1, the match starts from the POS position*/ intj =1;/*J for the current position subscript value in the substring T*/ while(I <= s[0] && J <= t[0])/*If I is less than the length of S and J is less than the length of T, the Loop continues*/ { if(S[i] = = T[j])/*Two letters are equal then continue*/ { ++i; ++J; } Else /*The pointer backs back to start matching*/{i= i-j+2;/*I go back to the next one in the last match first*/J=1;/*J back to the first of the substring T*/ } } if(J > t[0]) returni-t[0]; Else return 0;}
Using the above algorithm, suppose we want to find Google from the main string goodgoogle, we need the following steps
Think about if we're going to s= "00000000000000000000000000000000000000000000000000001" in the main string, and to match the substring t= "0000000001"
In other words, the T-string needs to be judged 10 times in the first 40 positions of the S string and a mismatch is reached, until the 41st bit matches all equal
So the worst case time complexity is O (((n-m) +1) *m)
Principle of KMP pattern matching algorithm
If the main string s= "Abcdefgab", the substring to match t= "Abcdex"
If you use the naïve algorithm, then the matching flowchart is as follows:
Think about it, the "Abcdex" in the substring T is not equal to any of the characters in the subsequent string "Bcdex", since a is not equal to any of the characters in the substring behind itself, then for 1, the first five characters are equal, It means that the first character of the substring T cannot be equal to the 2nd to 5th character of S string, which means that the judgment in 2, 3, 4, 5 is superfluous.
If there are characters equal to the first character in the substring T, it is possible to omit part of the unnecessary judgment step.
We define the change of J value for each position of T string as an array next, then the length of next is the length of the T string.
Example of next array value deduction
KMP Pattern matching algorithm implementation code
/*returns the next array of substring T by calculation. */voidGet_next (String T,int*next) { inti,j; I=1; J=0; next[1]=0; while(i<t[0])/*here T[0] indicates the length of the string T*/ { if(j==0|| t[i]== T[j])/*T[i] Represents a single character of the suffix, t[j] represents a single character of the prefix*/ { ++i; ++J; Next[i]=J; } ElseJ= Next[j];/*if the characters are not the same, the J value backtracking*/ }}/*returns the position of the substring T after the first POS character in the main string s. If it does not exist, the function returns a value of 0. *//*t non-empty, 1≤pos≤strlength (S). */intINDEX_KMP (String S, String T,intPOS) { inti = pos;/*I is used for the current position subscript value in the main string s, and if POS is not 1, the match starts from the POS position*/ intj =1;/*J for the current position subscript value in the substring T*/ intnext[255];/*define a next array*/Get_next (T, next); /*parse the string T to get the next array*/ while(I <= s[0] && J <= t[0])/*If I is less than the length of S and J is less than the length of T, the Loop continues*/ { if(j==0|| S[i] = = T[j])/*Two letters are equal to continue, with the naïve algorithm added j=0 judgment*/ { ++i; ++J; } Else /*The pointer backs back to start matching*/J= Next[j];/*J return to the appropriate position, I value unchanged*/ } if(J > t[0]) returni-t[0]; Else return 0;}
The time complexity of the above Get_next is O (m), while the time complexity of the INDEX_KMP in the while loop is O (n), so the time complexity of the whole algorithm is O (n+m)
Improvement of KMP pattern matching algorithm
such as the main string s= "Aaaabcde", substring t= "Aaaaax", then the next array value is 012345
The process of comparing using the KMP algorithm is as follows:
When i=5,j=5, B is not equal to a, such as 1
J=next[5]=4, such as 2,b and the fourth position of a still unequal
J=next[4]=3, such as 3,...
To think about it, 2, 3, 4, 5 steps are superfluous, because the 第二、三、四、五位 character of the T string is equal to the first a, then you can use the value of the first next[1] to replace the value of the subsequent next[j] with its equal character.
Improved version of the KMP algorithm implementation code
/*the next function of the pattern string T is corrected and deposited into the array nextval*/voidGet_nextval (String T,int*nextval) { inti,j; I=1; J=0; nextval[1]=0; while(i<t[0])/*here T[0] indicates the length of the string T*/ { if(j==0|| t[i]== T[j])/*T[i] Represents a single character of the suffix, t[j] represents a single character of the prefix*/ { ++i; ++J; if(T[i]!=t[j])/*if the current character differs from the prefix character*/Nextval[i]= J;/*then the current j is the value of nextval at I position*/ ElseNextval[i]= Nextval[j];/*if the prefix character is the same, the prefix character's*/ /*Nextval value assigned to nextval at position I*/ } ElseJ= Nextval[j];/*if the characters are not the same, the J value backtracking*/ }}
Nextval Array Value derivation
(The detailed analysis diagram is as follows:
)
Another example (see if you've deduced it correctly)
Four of the big Talk data structure (string)