Four of the big Talk data structure (string)

Source: Internet
Author: User

Definition of a string

String is a finite sequence of 0 or more characters, also called a string

The number of characters in the string n is called the length of the string

A string of 0 characters is called an empty string.

Abstract data types for strings

Sequential storage structure of strings

String Me chained storage structure

A node can store a character or consider storing multiple characters, and if the last node is not full, it can be filled with # or other non-string value characters.

A simple pattern matching algorithm

Starts each character of the main string as a substring and matches the character to match. To cycle through the main string, each character starts with a small loop of the length of the t until the match succeeds or all the traversal is complete.

Time Complexity of O (n+m)

/*returns the position of the substring T after the first POS character in the main string s. If it does not exist, the function returns a value of 0. *//*where, t non-empty, 1≤pos≤strlength (S). */intIndex (String S, String T,intPOS) {    inti = pos;/*I is used for the current position subscript value in the main string s, and if POS is not 1, the match starts from the POS position*/    intj =1;/*J for the current position subscript value in the substring T*/     while(I <= s[0] && J <= t[0])/*If I is less than the length of S and J is less than the length of T, the Loop continues*/    {        if(S[i] = = T[j])/*Two letters are equal then continue*/          {            ++i; ++J; }           Else                 /*The pointer backs back to start matching*/{i= i-j+2;/*I go back to the next one in the last match first*/J=1;/*J back to the first of the substring T*/          }          }    if(J > t[0])         returni-t[0]; Else         return 0;}

Using the above algorithm, suppose we want to find Google from the main string goodgoogle, we need the following steps

Think about if we're going to s= "00000000000000000000000000000000000000000000000000001" in the main string, and to match the substring t= "0000000001"

In other words, the T-string needs to be judged 10 times in the first 40 positions of the S string and a mismatch is reached, until the 41st bit matches all equal

So the worst case time complexity is O (((n-m) +1) *m)

Principle of KMP pattern matching algorithm

If the main string s= "Abcdefgab", the substring to match t= "Abcdex"

If you use the naïve algorithm, then the matching flowchart is as follows:

Think about it, the "Abcdex" in the substring T is not equal to any of the characters in the subsequent string "Bcdex", since a is not equal to any of the characters in the substring behind itself, then for 1, the first five characters are equal, It means that the first character of the substring T cannot be equal to the 2nd to 5th character of S string, which means that the judgment in 2, 3, 4, 5 is superfluous.

If there are characters equal to the first character in the substring T, it is possible to omit part of the unnecessary judgment step.

We define the change of J value for each position of T string as an array next, then the length of next is the length of the T string.

Example of next array value deduction
KMP Pattern matching algorithm implementation code
/*returns the next array of substring T by calculation. */voidGet_next (String T,int*next) {    inti,j; I=1; J=0; next[1]=0;  while(i<t[0])/*here T[0] indicates the length of the string T*/     {        if(j==0|| t[i]== T[j])/*T[i] Represents a single character of the suffix, t[j] represents a single character of the prefix*/        {              ++i; ++J; Next[i]=J; }         ElseJ= Next[j];/*if the characters are not the same, the J value backtracking*/      }}/*returns the position of the substring T after the first POS character in the main string s. If it does not exist, the function returns a value of 0. *//*t non-empty, 1≤pos≤strlength (S). */intINDEX_KMP (String S, String T,intPOS) {    inti = pos;/*I is used for the current position subscript value in the main string s, and if POS is not 1, the match starts from the POS position*/    intj =1;/*J for the current position subscript value in the substring T*/    intnext[255];/*define a next array*/Get_next (T, next); /*parse the string T to get the next array*/     while(I <= s[0] && J <= t[0])/*If I is less than the length of S and J is less than the length of T, the Loop continues*/    {        if(j==0|| S[i] = = T[j])/*Two letters are equal to continue, with the naïve algorithm added j=0 judgment*/          {             ++i; ++J; }           Else             /*The pointer backs back to start matching*/J= Next[j];/*J return to the appropriate position, I value unchanged*/    }    if(J > t[0])         returni-t[0]; Else         return 0;}

The time complexity of the above Get_next is O (m), while the time complexity of the INDEX_KMP in the while loop is O (n), so the time complexity of the whole algorithm is O (n+m)

Improvement of KMP pattern matching algorithm

such as the main string s= "Aaaabcde", substring t= "Aaaaax", then the next array value is 012345

The process of comparing using the KMP algorithm is as follows:

When i=5,j=5, B is not equal to a, such as 1

J=next[5]=4, such as 2,b and the fourth position of a still unequal

J=next[4]=3, such as 3,...

To think about it, 2, 3, 4, 5 steps are superfluous, because the 第二、三、四、五位 character of the T string is equal to the first a, then you can use the value of the first next[1] to replace the value of the subsequent next[j] with its equal character.

Improved version of the KMP algorithm implementation code
/*the next function of the pattern string T is corrected and deposited into the array nextval*/voidGet_nextval (String T,int*nextval) {      inti,j; I=1; J=0; nextval[1]=0;  while(i<t[0])/*here T[0] indicates the length of the string T*/     {        if(j==0|| t[i]== T[j])/*T[i] Represents a single character of the suffix, t[j] represents a single character of the prefix*/        {              ++i; ++J; if(T[i]!=t[j])/*if the current character differs from the prefix character*/Nextval[i]= J;/*then the current j is the value of nextval at I position*/              ElseNextval[i]= Nextval[j];/*if the prefix character is the same, the prefix character's*/                                            /*Nextval value assigned to nextval at position I*/        }         ElseJ= Nextval[j];/*if the characters are not the same, the J value backtracking*/      }}

Nextval Array Value derivation

(The detailed analysis diagram is as follows:

Another example (see if you've deduced it correctly)

Four of the big Talk data structure (string)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.