string of data structures

Source: Internet
Author: User

String: is a finite sequence of 0 or more characters, also known as a string.

String comparisons are made by encoding the characters between the strings, while the character encoding refers to the sequence number of the character in the corresponding character set.


Given two strings of s= "a1a2 ... An "t=" b1b2 ... BM "S<t when one of the following conditions is met.

1.n<m and Ai=bi (i=1,2 .... , N)

For example s= "Hap" t= "Happy" because T is 2 more characters than S so s<t

2. There is a k<=min (m,n) made Ai=bi (i=1,2 ... , k-1), AK<BK

For example when s= "happen" t= "Happy" because the first 4 characters of the two strings are the same as the 5th letter (k value), the ASCLL code of the letter E is 101, and the ASCLL code of the letter Y is 121 obviously e<y so s<t.


Abstract data types for strings

ADT string (String)

Data

The elements in a string consist only of one character, and the neighboring elements have precursors and successors

Operation

Strassign (T,*chars): Generates a string T whose value equals the string constant chars

Strcopy (t,s): string s exist, by string s fu worth string T

Clearstring (s): string S exists, the string is emptied.

Stringempty (s): Returns True if String S is empty, otherwise false

Strlength (s): Returns the number of elements of the string S, that is, the length of the string

Strcompare (s,t): If s>t, return value >0, if S=t returns 0, if s<t, return value <0

Concat (T,S1,S2): Returns a new string that is joined by S1 and S2 with T.

SubString (Sub,s,pos,len): string S presence, 1<=pos<=strlength (s), and 0<=LEN<=STR Length (s)-pos+1, using a sub to return the first POS character of the string S with a substring of length len.

Index (S,t,pos): string s and T exist, T is non-empty string, 1<=pos<=strlength (s), if there is a substring in the main string s with the same string T value, it returns the POS character in the main string s After the first occurrence of the position, otherwise return 0

Replace (s,t,v): string s,t and V exist, T is a non-empty string, with V replaces all occurrences of non-overlapping substrings in the main string S that are equal to T

Strinsert (s,pos,t): string s and T exist, 1<=pos<=strlength (S) +1. Insert String T before the first POS character of string s

Strdelete (S,pos,len): string S presence, 1<=pos<=strlength (s)-len+1. Removes the first POS character from string s as Len substring

Endadt



Index implementation algorithm

T is a non-empty string, if there is a substring equal to T after the first POS character in the main string s

Returns the position of the first such substring in S, otherwise returns 0

int Index (String s,string t,int POS)

{

int n,m,i;

String Sub;

if (pos > 0)

{

n = strlength (S); Get the length of the main string s

m = Strlength (T); Get the length of the substring t

i = pos;

while (I <= n-m+1)

{

SubString (SUB,S,I,M); Take the main string the first I position length with T equal substring to Sub

if (Strcompare (sub,t)! = 0) If two strings are not equal

++i;

else If two strings are equal

return i; The I value is returned

}

}

return 0; If no substring returns equal to T 0

}



Sequential storage structure of strings

The sequential storage structure of a string stores the sequence of characters in a string using a contiguous set of storage units. Assigns a fixed-length store to each defined string variable, according to the predefined size. Typically, a fixed-length array is used. To represent the end of a string value.


Chained storage structure for strings

A node can hold one character or multiple characters, and if the last node is not fully occupied, it can be filled with # or other non-string value.



Pattern matching: Locating operations on sub-strings

Starts each character of the main string as a substring and matches the string to match.

Suppose the main string s and the length of the substring t to be matched exist s[0] and t[0].

Returns the position of the substring T after the POS character in the main string s, or the function return value of 0 if it does not exist.

T non-empty, 1<=pos<=strlength (S).

int Index (String s,string t,int POS)

int i = pos; I is used for the current position subscript in the main string s, if POS is not 1. The match is started from the POS location

int j = 1; J for the current position subscript value in the substring t

while (I<=s[0] && J <=t[0]) if I is less than S length and j is less than the length of T, loop

if (s[i] = = T[j]) Two letters equal then continue

++i;

++j;

else pointer back to start the match again

i = i-j+2; I go back to the next one in the last match first

j = 1; J back to the first of the substring T

if (J >t[0])

return i-t[0];

Else

return 0;

Time complexity O (n+m)

Worst case Time Complexity O ((n-m) m)


KMP Pattern Matching algorithm

T=abcabx

J 123456

Pattern string TAbcdex

NEXT[J]011111

1. When the j=1 is next[1]=0;

2. When j=2 J from 1 to j-1 only character A, belongs to other cases next[2]=1;

3. When j=3 J from 1 to j-1 string is AB, obviously a and b are unequal, belong to other cases, next[3]=1;

4. After the same, so the end of this T-string next[j] is 011111

T=abcabx

J123456

Pattern string TABCABX

NEXT[J]011123

1. When J=1 next[j]=0

2. When j=2 ibid. next[2]=1

3. When j=3 ibid. next[3]=1

4. When j=4 ibid. next[4]=1

5. When j=5 the string of J from 1 to J-1 is the ABCA prefix character A is equal to the suffix character a, it can be inferred that the K value is 2 so next[5]=2;

6. When j=6 J from 1 to j-1 string is Abcab, because the prefix character AB is equal to the suffix ab so next[6]=3

Based on experience if the previous character is equal, the K value is 2, 2 equals, the K value is 3,n, and the K value is n+1


The code is as follows

By calculating the next array that returns the substring T

void Get_next (String t,int *next)

int i,j;

I=1;

j=0;

next[1]=0;

while (I<t[0]) here T[0] indicates the length of the string T

if (j==0 | | T[I]==T[J]) T[i] represents a single character of the suffix, t[j] represents a single character of the prefix

++i;

++j;

Next[i]=j;

Else

J=NEXT[J]; If the characters are not the same, the J value backtracking


Calculates the next array of string t currently being matched.

Returns the position of the substring T after the POS character in the main string s and, if not present, the function return value of 0

T non-empty 1<=pos<=strlength (S)

int INDEX_KMP (String s,string t,int POS)

int i=pos; I for the main string s current position subscript value, if POS is not 1, then start from the POS position match

int j=1; J for the current position subscript value in the substring t

int next[255];

Get_next (T,next); Define a Next array

while (i <= s[0] && j<=t[0]) If I is less than the length of S and J is less than T, the loop continues

if (j==0 | | S[I]==T[J]) Two letters are equal then continued, with the naïve algorithm increased. J=0 judgment

++i;

++j;

else pointer back to start the match again

J=NEXT[J]; J return to the appropriate position I value is unchanged

if (J>t[0])

return i-t[0];

Else

return 0;

The KMP algorithm shows its advantages only when there are many partial matches between the pattern and the main string.


Improvement of KMP pattern matching algorithm

The next function of the pattern string T is corrected and deposited into the array nextval

void Get_nextval (String t,int *nextval)

int i,j;

I=1;

j=0;

nextval[1]=0;

while (I<t[0])

if (j==0 | | T[I]==T[J])

++i;

++j;

if (t[i]! = T[j])

Nextval[i]=j;

Else

NEXTVAL[I]=NEXTVAL[J];

Else

J=NEXTVAL[J];


Post-improvement comparison

T=ababaaaba

J123456789

Pattern string TAbabaaaba

NEXT[J]011234223

NEXTVAL[J]010104210

1. When J=1 nextval[1]=0

2. When j=2 because the next value of the 2nd character B is 1, and the first bit is a, they are not equal, so nextval[2]=next[2]=1 maintains the original value

3. When j=3 because the next value of the third character A is 1, so the first bit of a compared to know that they are equal, so nextval[3]=nextval[1]=0;

4. When j=4 fourth character B next value is 2, so compared with the second bit B to get the result is equal, so nextval[4]=nextval[2]=1

5.j=5 Next value is 3 the fifth character A is equal to the third character a, so nextval[5]=nextval[3]=0;

6. When the j=6 next value is 4 the sixth character A is not equal to the fourth character B, so nextval[6]=4;

7. When the j=7 next value is 2, the seventh character A is not equal to the second character B, so nextval[7]=2

8. When the j=8 next value is 2, the eighth character B is equal to the second character B nextval[8]=nextval[2]=1

9. When j=9, the next value is 3 and the nineth character A is equal to the third character a nextval[9]=nextval[3]=1

When the next value is computed, if the A-bit character is equal to the B-bit character that it points to the next value, then the nextval of the a bit points to the Nextval value of the B-bit, and if not, the Nextval value of the A bit is the next value of its own a bit.


This article is from the "linux_oracle" blog, make sure to keep this source http://pankuo.blog.51cto.com/8651697/1631335

string of data structures

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.