String search algorithm summary and strstr source code of MS

Source: Internet
Author: User

First, it refers to the string SEARCH, that is, to find the position where A specified string B appears in A specified string A, or to count the number of times other B appears in.
  MS provides a strstr function prototype: extern char * strstr (char * str1, char * str2); header file <string. h>. But the following code can be used directly without the header file:

View Code

char * __cdecl strstr (
const char * str1,
const char * str2
)
{
char *cp = (char *) str1;
char *s1, *s2;
if ( !*str2 )
return((char *)str1);
while (*cp)
{
s1 = cp;
s2 = (char *) str2;
while ( *s1 && *s2 && !(*s1-*s2) )
s1++, s2++;
if (!*s2)
return(cp);
cp++;
}
return(NULL);
}

You can directly replace char with wchar_t for wide characters. However, the query efficiency of strings is the lowest.
  For details about KMP Algorithms, refer to the following: 1. KMP Algorithms 2. Details about KMP Algorithms
In fact, the KMP algorithm can optimize the array.
  Sunday is a variant of the BM search algorithm, but it is more efficient. The following is an article from Aga. J, a blog garden friend:
The Sunday algorithm is a more efficient matching algorithm than KMP and BM. Its idea is similar to that of the BM algorithm, when the Sunday algorithm fails, the next character in the source string that is currently involved in the matching is M (the length of the mode string. If this character does not appear in the mode string, it will be skipped directly, that is, the moving step = the length of the mode string + 1. Otherwise, the moving step = the distance from the rightmost character to the end of the mode string plus 1.
Suppose we want to match "HEREISASIMPLEEXAMPLE" and "EXAMPLE"
Text: here is a simple example (HERE there is a space in the middle and two spaces are used for alignment. The same below)
Pattern: EXAMPLE

We compared the initial position of the source string with the mode string, and found that the first one does not match. If we use the simple algorithm, we only need to shift the right of the mode string, when the KMP algorithm is used, the moving step is determined based on its own mode array. When the Sunday algorithm is used, we will compare the next character after the source string is aligned, that IS, the space behind IS in text, because no matter what method we use to move the mode string, this character is always involved in the next match (assuming we only move one or less than the length of the pattern string, the space must be included in the match, otherwise, we may omit the possible match. If there is no match in the movement of the bits smaller than the length of the pattern string, the next starting match point is the space position ).
Since it is known that this character must be matched, it is compared with the mode string. If this character does not exist in the mode string (here it is a space ), then, Jump directly to the next character of the space character for matching (this time starts from E and)
Text: HERE IS A SIMPLE EXAMPLE
Pattern: EXAMPLE
At this time, we will make another matching judgment and find that the first character does not match, so like the above method, we will directly look at the next character after alignment E, in this case, compare E from the right to the left and the mode string. If E exists in the mode string, move the mode string to know that the two E-pairs are aligned, the matching idea from the right to the left is based on the BM algorithm. The advantage of the comparison from the back to the left is that when there is a mismatch, the distance between the two is larger, because none of the following matches, and the previous matches are useless, here we use E to match from right to left in the pattern string, locate the first matching location with E.-For details, refer to the analysis of the BM algorithm !)
Text: HERE IS A SIMPLE EXAMPLE
Pattern: EXAMPLE
The next step is to compare the first position of the pattern string with the new start position of the source string, and find that there is no matching, and then judge the last position of the alignment, this time, after moving the entire pattern string to a new space that does not match, the matching is completed.
The above example is excerpted from the Internet. It is not very typical, because during the second match, the next bit after the source string is aligned matches the last bit of the pattern string, therefore, it does not reflect the process of moving the mode string of the Sunday algorithm.
 

View Code

/*
* Algorithm analysis:
* 1 starts from the first character and compares pattern and source characters one by one
* 2 if a match fails, check whether the last character of source at the end of pattern is equal to a character of pattenr.
* If yes, the equal character (from right to left) is re-executed.
* If not, set the initial bit of comparison between pattern and source to the next character of source before executing
* 3. If yes, the request is successful.
*/
# Include <iostream>
Using namespace std;

// Check whether the source string has exceeded the length before calling. The function returns the position of the first character that does not match the source string with sourceStartPos as the comparison start point,
Int compare (char * source, char * pattern, int sourceStartPos, int patternLength)
{
Int I = 0;
For (; (I <patternLength) & (source [sourceStartPos + I] = pattern [I]); I ++)
{
;
}
Return I; // return the position I, pattern [I] of elements that cannot be matched in pattern.
}

Bool sundayMatch (char * source, char * pattern, int sourceLength, int patternLength)
{
Int startPos = 0; // start point of the source string
Int failMatchPos = 0; // failure point
Int j = 0;
While (startPos + patternLength-1) <= sourceLength) // The length of the remaining substrings in the source string is shorter than that in the mode string, that is, you can continue to compare
{
FailMatchPos = compare (source, pattern, startPos, patternLength); // obtain the position of the first mismatched character in the source string
Cout <"failMatchPos:" <failMatchPos <"";
If (failMatchPos = patternLength) // if it is exactly the same length as the pattern string, the match is successful.
Return true;
Else
{
For (j = patternLength-1; j> = 0; j --) // check whether the next character of the source string matches any one of the modes, note: Search from right to left
If (source [startPos + patternLength] = pattern [j])
{
StartPos + = patternLength-j;
Break; // once it exists, the next starting matching point is initialized, that is, the so-called alignment
}
If (j <0) // skip if it does not exist
StartPos = startPos + patternLength + 1;
}
Cout <"newStartPos" <startPos <endl;
}
Return false;
}
Void main ()
{
Char * s1 = "this is a example"; // HERE_IS_A_SIMPLE_EXAMPLE 24
Char * s2 = "EXAMPLE ";
Bool result = sundayMatch (s1, s2, 17, 7 );
If (result)
Cout <"yes ";
Int I = 0;
Cin> I;
}

The implementation method on the network is as follows. First, a preprocessing is performed for each character that appears in a substring, the next position after the source string alignment is saved is the distance to be moved once the matching or non-matching occurs, and can be directly used during the matching process, you do not need to repeat the method I wrote to determine whether or not the character of a bit is displayed in the mode string and where it is located. (This method takes space but wins time)
  

View Code

/* Use the pre-processing method of BM/KMP, calculate the moving step in advance, and use it directly when an unmatched value is encountered */
# Include <iostream>
# Include <string. h>
Using namespace std;

# Define MAX_CHAR_SIZE 256 // a maximum of 256 characters per character (8 characters)
/*
* Set the rightmost step for each character and save the moving step for each character
* If a character on the right of the matching character in a large string is not in the child string, the stride of the large string is equal to the distance of the entire string + 1
* If a character on the right within the matching range of a large string is in the substring, the Movement distance of the large string = the length of the substring-the position of the character in the substring
*/
Int * setCharStep (char * subStr)
{
Int * charStep = new int [MAX_CHAR_SIZE]; // The Code does not pay attention to security :)
Int subStrLen = strlen (subStr );
For (int I = 0; I <MAX_CHAR_SIZE; I ++)
CharStep [I] = subStrLen + 1;
// If one character on the right of the matching character in the string is not in the substring, the stride of the string is increased by 1.
// Scan from left to right to save the step size required for each character in the substring
For (int j = 0; j <subStrLen; j ++)
{
CharStep [(unsigned char) subStr [I] = subStrLen-I;
// If a character on the right within the matching range of a large string is in the substring, the moving distance of the large string = the length of the substring-the position of the character in the substring
}
Return charStep;
}
/*
* The core idea of the algorithm is to match from left to right. If there is any mismatch, the first character on the right out of the matching range in the string is at the rightmost position in the string.
* Move the pointer of a large string based on the pre-calculated moving step until matching
*/
Int sundaySearch (char * mainStr, char * subStr, int * charStep)
{
Int mainStrLen = strlen (mainStr );
Int subStrLen = strlen (subStr );
Int main_ I = 0;
Int sub_j = 0;
While (main_ I <mainStrLen)
{
Int tem = main_ I; // Save the starting position of each start match of the string to facilitate pointer movement.
While (sub_j <subStrLen)
{
If (mainStr [main_ I] = subStr [sub_j])
{
Main_ I ++;
Sub_j ++;
Continue;
}
Else {// if the first character cannot be found outside the matching range, the matching fails.
If (tem + subStrLen)> mainStrLen)
Return-1;
// Otherwise, move the step and re-match
Char firstRightChar = mainStr [tem + subStrLen];
Main_ I = tem + charStep [(unsigned char) firstRightChar];
Sub_j = 0;
Break; // exit this failed match and re-match
}
}
If (sub_j = subStrLen)
Return (main_ I-subStrLen );
}
Return-1;
}
Int main ()
{
Char * mainStr = "absaddsasfasdfasdf ";
Char * subStr = "dd ";
Int * charStep = setCharStep (subStr );
Cout <"Location:" <sundaySearch (mainStr, subStr, charStep) <endl;
System ("pause ");
Return 0;
}

[Thank you for your reference]
The preceding part is taken from the study notes of the string matching algorithm Sunday algorithm.

Here are several articles about string SEARCH:
1. KMP Algorithm
2. The KMP algorithm is not optimized for string search. [go to]
3. exact string matching (BM algorithm) [conversion]
4. Sunday algorithm Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.