String: is a finite sequence of 0 or more characters, also known as a string.
String comparisons are made by encoding the characters between the strings, while the character encoding refers to the sequence number of the character in the corresponding character set.
Given two strings of s= "a1a2 ... An "t=" b1b2 ... BM "S<t when one of the following conditions is met.
1.n<m and Ai=bi (i=1,2 .... , N)
For example s= "Hap" t= "Happy" because T is 2 more characters than S so s<t
2. There is a k<=min (m,n) made Ai=bi (i=1,2 ... , k-1), AK<BK
For example when s= "happen" t= "Happy" because the first 4 characters of the two strings are the same as the 5th letter (k value), the ASCLL code of the letter E is 101, and the ASCLL code of the letter Y is 121 obviously e<y so s<t.
Abstract data types for strings
ADT string (String)
Data
The elements in a string consist only of one character, and the neighboring elements have precursors and successors
Operation
Strassign (T,*chars): Generates a string T whose value equals the string constant chars
Strcopy (t,s): string s exist, by string s fu worth string T
Clearstring (s): string S exists, the string is emptied.
Stringempty (s): Returns True if String S is empty, otherwise false
Strlength (s): Returns the number of elements of the string S, that is, the length of the string
Strcompare (s,t): If s>t, return value >0, if S=t returns 0, if s<t, return value <0
Concat (T,S1,S2): Returns a new string that is joined by S1 and S2 with T.
SubString (Sub,s,pos,len): string S presence, 1<=pos<=strlength (s), and 0<=LEN<=STR Length (s)-pos+1, using a sub to return the first POS character of the string S with a substring of length len.
Index (S,t,pos): string s and T exist, T is non-empty string, 1<=pos<=strlength (s), if there is a substring in the main string s with the same string T value, it returns the POS character in the main string s After the first occurrence of the position, otherwise return 0
Replace (s,t,v): string s,t and V exist, T is a non-empty string, with V replaces all occurrences of non-overlapping substrings in the main string S that are equal to T
Strinsert (s,pos,t): string s and T exist, 1<=pos<=strlength (S) +1. Insert String T before the first POS character of string s
Strdelete (S,pos,len): string S presence, 1<=pos<=strlength (s)-len+1. Removes the first POS character from string s as Len substring
Endadt
Index implementation algorithm
T is a non-empty string, if there is a substring equal to T after the first POS character in the main string s
Returns the position of the first such substring in S, otherwise returns 0
int Index (String s,string t,int POS)
{
int n,m,i;
String Sub;
if (pos > 0)
{
n = strlength (S); Get the length of the main string s
m = Strlength (T); Get the length of the substring t
i = pos;
while (I <= n-m+1)
{
SubString (SUB,S,I,M); Take the main string the first I position length with T equal substring to Sub
if (Strcompare (sub,t)! = 0) If two strings are not equal
++i;
else If two strings are equal
return i; The I value is returned
}
}
return 0; If no substring returns equal to T 0
}
Sequential storage structure of strings
The sequential storage structure of a string stores the sequence of characters in a string using a contiguous set of storage units. Assigns a fixed-length store to each defined string variable, according to the predefined size. Typically, a fixed-length array is used. To represent the end of a string value.
Chained storage structure for strings
A node can hold one character or multiple characters, and if the last node is not fully occupied, it can be filled with # or other non-string value.
Pattern matching: Locating operations on sub-strings
Starts each character of the main string as a substring and matches the string to match.
Suppose the main string s and the length of the substring t to be matched exist s[0] and t[0].
Returns the position of the substring T after the POS character in the main string s, or the function return value of 0 if it does not exist.
T non-empty, 1<=pos<=strlength (S).
int Index (String s,string t,int POS)
{
int i = pos; I is used for the current position subscript in the main string s, if POS is not 1. The match is started from the POS location
int j = 1; J for the current position subscript value in the substring t
while (I<=s[0] && J <=t[0]) if I is less than S length and j is less than the length of T, loop
{
if (s[i] = = T[j]) Two letters equal then continue
{
++i;
++j;
}
else pointer back to start the match again
{
i = i-j+2; I go back to the next one in the last match first
j = 1; J back to the first of the substring T
}
}
if (J >t[0])
return i-t[0];
Else
return 0;
}
Time complexity O (n+m)
Worst case Time Complexity O ((n-m) m)
KMP Pattern Matching algorithm
T=abcabx
J 123456
Pattern string TAbcdex
NEXT[J]011111
1. When the j=1 is next[1]=0;
2. When j=2 J from 1 to j-1 only character A, belongs to other cases next[2]=1;
3. When j=3 J from 1 to j-1 string is AB, obviously a and b are unequal, belong to other cases, next[3]=1;
4. After the same, so the end of this T-string next[j] is 011111
T=abcabx
J123456
Pattern string TABCABX
NEXT[J]011123
1. When J=1 next[j]=0
2. When j=2 ibid. next[2]=1
3. When j=3 ibid. next[3]=1
4. When j=4 ibid. next[4]=1
5. When j=5 the string of J from 1 to J-1 is the ABCA prefix character A is equal to the suffix character a, it can be inferred that the K value is 2 so next[5]=2;
6. When j=6 J from 1 to j-1 string is Abcab, because the prefix character AB is equal to the suffix ab so next[6]=3
Based on experience if the previous character is equal, the K value is 2, 2 equals, the K value is 3,n, and the K value is n+1
The code is as follows
By calculating the next array that returns the substring T
void Get_next (String t,int *next)
{
int i,j;
I=1;
j=0;
next[1]=0;
while (I<t[0]) here T[0] indicates the length of the string T
{
if (j==0 | | T[I]==T[J]) T[i] represents a single character of the suffix, t[j] represents a single character of the prefix
{
++i;
++j;
Next[i]=j;
}
Else
J=NEXT[J]; If the characters are not the same, the J value backtracking
}
}
Calculates the next array of string t currently being matched.
Returns the position of the substring T after the POS character in the main string s and, if not present, the function return value of 0
T non-empty 1<=pos<=strlength (S)
int INDEX_KMP (String s,string t,int POS)
{
int i=pos; I for the main string s current position subscript value, if POS is not 1, then start from the POS position match
int j=1; J for the current position subscript value in the substring t
int next[255];
Get_next (T,next); Define a Next array
while (i <= s[0] && j<=t[0]) If I is less than the length of S and J is less than T, the loop continues
{
if (j==0 | | S[I]==T[J]) Two letters are equal then continued, with the naïve algorithm increased. J=0 judgment
{
++i;
++j;
}
else pointer back to start the match again
{
J=NEXT[J]; J return to the appropriate position I value is unchanged
}
}
if (J>t[0])
return i-t[0];
Else
return 0;
}
The KMP algorithm shows its advantages only when there are many partial matches between the pattern and the main string.
Improvement of KMP pattern matching algorithm
The next function of the pattern string T is corrected and deposited into the array nextval
void Get_nextval (String t,int *nextval)
{
int i,j;
I=1;
j=0;
nextval[1]=0;
while (I<t[0])
{
if (j==0 | | T[I]==T[J])
{
++i;
++j;
if (t[i]! = T[j])
Nextval[i]=j;
Else
NEXTVAL[I]=NEXTVAL[J];
}
Else
J=NEXTVAL[J];
}
}
Post-improvement comparison
T=ababaaaba
J123456789
Pattern string TAbabaaaba
NEXT[J]011234223
NEXTVAL[J]010104210
1. When J=1 nextval[1]=0
2. When j=2 because the next value of the 2nd character B is 1, and the first bit is a, they are not equal, so nextval[2]=next[2]=1 maintains the original value
3. When j=3 because the next value of the third character A is 1, so the first bit of a compared to know that they are equal, so nextval[3]=nextval[1]=0;
4. When j=4 fourth character B next value is 2, so compared with the second bit B to get the result is equal, so nextval[4]=nextval[2]=1
5.j=5 Next value is 3 the fifth character A is equal to the third character a, so nextval[5]=nextval[3]=0;
6. When the j=6 next value is 4 the sixth character A is not equal to the fourth character B, so nextval[6]=4;
7. When the j=7 next value is 2, the seventh character A is not equal to the second character B, so nextval[7]=2
8. When the j=8 next value is 2, the eighth character B is equal to the second character B nextval[8]=nextval[2]=1
9. When j=9, the next value is 3 and the nineth character A is equal to the third character a nextval[9]=nextval[3]=1
When the next value is computed, if the A-bit character is equal to the B-bit character that it points to the next value, then the nextval of the a bit points to the Nextval value of the B-bit, and if not, the Nextval value of the A bit is the next value of its own a bit.
This article is from the "linux_oracle" blog, make sure to keep this source http://pankuo.blog.51cto.com/8651697/1631335
string of data structures