C-language implementation of string matching KMP algorithm _c language

Source: Internet
Author: User
Tags bitwise key string strlen

String matching is one of the basic tasks of a computer.

For example, there is a string "BBC Abcdab Abcdabcdabde", I want to know, does it contain another string "Abcdabd"?

The following explanation steps for the KMP algorithm

1.

First, the first character of the string "BBC Abcdab Abcdabcdabde" is compared to the first character of the search term "abcdabd". Because B does not match A, the search term is moved one bit after the other.

2.

Because B does not match A, the search term moves backwards.

3.

In this way, until the string has a character, the same as the first character of the search term.

4.

Then compare the string and the next character of the search term, or the same.

5.

Until the string has a character that is not the same as the corresponding character in the search term.

6.

A basic fact is that when the Abcdab does not match D, you actually know that the first six characters are "the". The idea of the KMP algorithm is to try to use this known information, not to move the "search position" back to where it has been compared, and continue to move it backwards, thus increasing efficiency.

8.

How do you do that? You can work out a partial matching table (Partial match) for the search term. This table is how to produce, and then introduced later, here as long as it can be used.

9.

The previous six characters "Abcdab" are matched when a known space does not match d. The table shows that the last matching character B corresponds to a "partial match" of 2, so the following formula calculates the number of digits to move backwards:

Moved digits = number of characters matched-corresponding partial match values

Because 6-2 is equal to 4, the search term is moved backwards by 4 bits.

10.

Because the space does not match C, the search term continues to move backwards. At this point, the number of characters that have been matched is 2 ("AB"), and the corresponding "partial match value" is 0. So, the number of digits = 2-0, and the result is 2, so the search term is moved back 2 digits.

11.

Because the space does not match a, move back one bit.

12.

Bitwise comparison until the C and D are found to be mismatched. So, move the number = 6-2, and continue to move the search word back to 4 bits.

13.

A bitwise comparison, until the last of the search words, finds an exact match, and the search completes. If you want to continue the search (that is, find all the matches), move the number = 7-0, and then move the search word back to 7 digits, here is no longer repeated.

14.

The following describes how the partial matching table is produced.

First, understand two concepts: "prefix" and "suffix." "prefix" means the entire header combination of a string except the last character; "suffix" means all the tail combinations of a string except the first character.

15.

A partial-match value is the length of the longest shared element of prefix and suffix. Take "Abcdabd" as an example,

-the prefix and suffix of "A" are set, and the total element length is 0;

-"AB" has a prefix of [A], the suffix is [B], and the length of the total element is 0;

-The prefix of "ABC" is [A, AB], the suffix is [BC, C], the length of the total element is 0;

-"ABCD" is prefixed with [A, AB, ABC], suffix [BCD, CD, D], with a total element length of 0;

-"ABCDA" is prefixed with [A, AB, ABC, ABCD], and the suffix is [bcda, CDA, DA, a], with a total element of "a" and a length of 1;

-"Abcdab" is prefixed with [A, AB, ABC, ABCD, abcda], suffix [Bcdab, Cdab, DAB, AB, B], with a total element of "AB" and a length of 2;

-"ABCDABD" is prefixed with [A, AB, ABC, ABCD, ABCDA, Abcdab], suffix [bcdabd, cdabd, Dabd, ABD, BD, D], with a total element length of 0.

16.

The essence of "partial matching" is that sometimes there is a repetition of the head and tail of the string. For example, "Abcdab" has two "AB", then its "partial matching value" is 2 ("ab" length). When the search word moves, the first "AB" Moves backwards 4 bits (string length-a partial match) to the second "ab" position.

Next, is my own implementation of the KMP algorithm.

The implementation of this algorithm mainly includes three aspects:

1) to obtain the partial matching value table We use to search the string

2) Implementation of the search string in the search process of the pointer movement problem

3 How to locate the results of our search

Next, I'll post the code I implemented.

  

/*
* Using KMP algorithm to implement string matching search method
* The function of this program is to search the contents of all the files under this directory and whether it is related to the given
* String match, if match, output file name: The line containing the string
* To search for the target string search pointer to move the number of digits = matched characters-corresponding partial matching values
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define KEYWORD_MAX_LENGTH 100//Set maximum length of the search string
int kmp_table[keyword_max_length]; Create a KMP table for a search string
Char Prefix_stack[keyword_max_length]; Prefix expression stack
Char Suffix_stack[keyword_max_length]; Suffix expression stack
int keyword_length = 0; Length of search string
int record_position[keyword_max_length]; Record and key string match position in source string
/*
*getmatchvalue: Get partial matching value of string src
*/
int Getmatchvalue (char *src)
{
int value = 0;
int src_len = strlen (src);
char *begin = src; Initializes the first character of a string
Char *end = src + (src_len-1); Initializes the last character of a string
int i = 0;
For (i=0;i< (src_len-1); i++)
{
Prefix_stack[i] = *begin;
Suffix_stack[i] = *end;
begin++;
end--;
}
char *p = prefix_stack;
Char *q = Suffix_stack + (src_len-2); Point to the last element in the stack
int flag = 0; Use a flag bit to determine whether the last element in the suffix stack matches the symbol in the prefix stack
while (Q >= suffix_stack)
{
if (*p = = *q)
{
value++;
p++;
flag=1;
}
else {
Flag = 0;
}
q--;
}
if (flag = = 0) value = 0;
return value;
}
/*
* Create a KMP table of search strings
*/
int create_kmp_table (char *str,int *table)
{
int i;
Char *dst;
Keyword_length = strlen (str);
for (i=0;i<keyword_length;i++)
{
if (i = = 0) {
Table[i] = 0; The first character has no prefix and suffix, so it is 0
}
else {
DST = (char*) malloc ((i+2));
if (DST = NULL)
{
printf ("malloc space error!\n");
return exit_failure;
}
strncpy (Dst,str, (i+1)); Match the pre (i+1) character of Str
Dst[i+1] = ' the '; Note that the string ends with '/0 '
Table[i] = Getmatchvalue (DST);
Free ((void*) DST);
}
}
return exit_success;
}
Print the KMP table corresponding to the search string
void Table_print (char *str,int *table)
{
int i;
char C = *str;
while (c!= ' a ')
{
printf ("%-4c", c); Left aligns the characters in the search string
c = *++str;
}
printf ("\ n");
for (i=0;i<keyword_length;i++)
{
printf ("%-4d", Table[i]); Left aligns the partial matching values for each character
}
printf ("\ n");
}
Searches for key substring search_str in the target string dst_str, prints out the location information of the key string, and returns the number of matches with the key string
int Search_keyword (char *dst_str,char *search_str)
{
char *p = DST_STR;
char *q = SEARCH_STR;
Char *temp;

Create_kmp_table (search_str,kmp_table);

int count = 0; Number of records that have now been matched
int k = 0; Record the number of strings that match the key string
while (*p!= ' ")//until the last character of the target string is searched
{
temp = p;
while (*q!= ' ")
{
if (*q = = *temp)
{
count++;
temp++;
q++;
}
else break;
}

if (count = = 0)
p++;
else {
if (count = = keyword_length)
{
record_position[k++] = (temp-dst_str)-(keyword_length);
}
move = Count-kmp_table[count-1];
p + = move;
}
Count = 0;
Q = search_str;
}
return k;
}

int main (int argc,char **argv)
{
Char *search_str = argv[1];
Char dst_str[] = "Hello woshijpf woshijpf woshij woshijp WOSHIJPF";
Char dst_str[] = "BBC abcdab Abcdabcdabde";

printf ("Please input serach string and dst_string\n");
if (search_str = NULL)
{
printf ("Please input search string\n");
return exit_failure;
}
if (dst_str = NULL)
{
printf ("Please input dst_string\n");
return exit_failure;
}

int result = Search_keyword (DST_STR,SEARCH_STR); Number of results returned to the search
Table_print (search_str,kmp_table);
printf ("%s\n", dst_str); Output target string to search
if (result = = 0)
{
printf ("sorry! Don t find the string%s\n ", search_str);
return exit_success;
}
else {
int i,j,num;
int before = 0;
for (i=0;i<result;i++)
{
num = Record_position[i]-before; Print the location of the search string in the target string
before = record_position[i]+1;
for (j=1;j<=num;j++)
printf ("");
printf ("*");
}
printf ("\ n");
}

return exit_success;
}

Results of the test:

  

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.