Matching of random and scattered learning algorithms and re-describing strings

Source: Internet
Author: User

Matching of random and scattered learning algorithms and re-describing strings

 

Body

 

String Matching is an old topic, and we are also keen to learn and explore it, and we often use it. For example, when we use Vim to open a text file and search for a string in this file, we only need to input/string in the baseline mode. For example, on the Linux terminal, We need to print all the C files in the current directory, then we will use a regular expression to perform the matching operation (all c files can be represented *. c) instead of looking for them one by one.

 

Okay, the book is followed by the text. The purpose of this article is:

 


I: I have previously written an article about string matching. However, this article only describes the fixed string matching and KMP algorithm, but does not describe the dynamic string matching. So I always wanted to write articles about dynamic matching;

Second, for an algorithm enthusiast, say: Say what you think and show your code! If this is the case, it's not easy!

 

Chapter 1 fixed String Matching

 

Why do I call it a fixed string? Let me give you an example (no way, language skills are not good )! Assume that the source string... Fcsupering ..., Do we want to know if the substring superfc has appeared in the source string? At this point, it is not difficult to see that the substring is a definite string (excluding uncertain characters), so I will convert it into a fixed string matching method.

 

I don't need to say more about the algorithm to solve this problem. It is naturally the KMP algorithm (the worst case is O (strlen (source_string) + k. As mentioned earlier, this algorithm is not detailed here. The key step of this algorithm is to generate the source string prefix array. The following example shows how to generate a prefix array:

 

 

For the KMP algorithm, scan the source string, update and mark the position where the substring appears in the source string.

 

Note: For prefix functions of KMP Algorithms, when the initial subscript of an array is 0, do not set the initial value of the prefix function to 0. Otherwise, an endless loop may occur. If you are interested in this, you can try it yourself.

 

Well, the fixed string matching is like this, and the solution is also available. Next let's look at the implementation of the Code:

 

Meanings of variables used in the Code:

/* Source: Source string ** sourcelength: Source String Length ** Prefix: prefix array ** pattern: matched string, that is, sub-string ** p_length: Sub-String Length ** Buffer: when a substring matches a source string, use buffer to save the substrings */
/** Get prefix array for the KMP algorithm **/void getprefixarray (char * Source, int * prefix, int sourcelength) {int K = 0, I; prefix [k] =-1; // when the subscript starts with 0, the starting value must be set to 1 (for the reason, see the following explanation) for (I = 1; I <sourcelength; I ++) {While (source [k]! = Source [I] & K> 0) {k = prefix [k];} If (source [k] = source [I]) {++ K ;} prefix [I] = K ;}}
Void stringmatchofkmp (char * Source, int sourcelength, int * prefix, char * pattern, int p_length, char * buffer) {int I = 0, j = 0, Index = 0; getprefixarray (source, prefix, sourcelength); While (I <sourcelength) {While (j> 0 & source [I]! = Pattern [J]) {J = prefix [J];} If (source [I] = pattern [J]) {J ++ ;} if (j = p_length) {memcpy (buffer + Index * p_length), pattern, strlen (pattern); index ++; // when multiple matches exist, index * p_length is the starting position of each substring in the buffer} I ++ ;}}
Chapter 2 single-character Dynamic Matching

 

The so-called single-character dynamic match is: when the string appears? This character is used to match any character (because it is a single character, so it is a single character dynamic match ). How can this problem be solved? It is very simple. We only need to traverse the source string once, and do not need to use the KMP algorithm like fixed string matching. That is, when a substring matches a source string, when? Character, we directly think that the character in the substring matches the character at the corresponding position in the source string successfully. For example, if the source string is hellosuperfc and the Child string is S * per, then when the source string matches S in the Child string, perform ++, respectively, at this time, the U in the source string will be compared with the * in the sub-string. In this case, we will think that * is the character U. The matching is successful and then the next character is matched.

 

OK. Here is an illustration:

 

Single-character dynamic matching is like this. Let's take a look at its implementation code:

Void stringmatchofask (char * Source, int sourcelength, char * pattern, int p_length, char * buffer) {int I = 0, j = 0; int Index = 0; while (I <sourcelength) {If (pattern [J] = source [I]) | (pattern [J] = '? ') {/* Matched */If (J + 1) = p_length) {memcpy (buffer + Index * p_length ), (Source + I-p_length + 1), p_length); index ++; j = 0 ;}else {J ++; I ++ ;}} else {I = I-j + 1; // equivalent to I ++ J = 0 ;}}}
Chapter 3 multi-character Dynamic Matching

 

The so-called multi-character dynamic match means that when * characters appear in a substring, this character indicates that it can match at least 0 characters (because it can match multiple characters, so it is a multi-character dynamic match ). How can this problem be solved? I think it is simpler than dynamic match of a single character, that is, when the character * in the substring is compared with the corresponding character in the source string, I think that * contains the current position in the source string, the matching ends until the next character in the substring is equal to the character in the source string. This is too difficult. I will use a legend to illustrate:

 

Let's take a look at the implementation of the Code:

Void stringmatchofstar (char * Source, int sourcelength, char * pattern, int p_length, char * buffer) {int I = 0, j = 0; int Index = 0; while (I <sourcelength) {If (source [I] = pattern [J]) {/* matched completion */If (p_length = ++ J) {memcpy (buffer, (source + index), I-index + 1); break;}/* Save the subscript of the Start match */If (0 = J-1) {Index = I ;}} else if (pattern [J] = '*') {J ++;} I ++ ;}}
Chapter IV Concluding remarks

 

Think about, write, and draw ......

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.