Approximate string matching algorithm

Source: Internet
Author: User
Tags assert

The approximate matching of strings is to allow a certain amount of error in the match, such as in the string "before the master long time no see" to find "before is a master" can also be successful. Specifically, there are three types of errors: Added characters (formerly Masters), leaky characters (formerly Masters), and replacement characters (formerly plaster Hands). The following function finds the substring pat in text with a maximum of K errors allowed. Return is the matching end point (I have not yet figured out how to determine the starting point, hehe).

As for the principle of the algorithm, now suddenly say not clear, can only say that this is a non-deterministic finite automaton, later have time to detail. If you are interested, you can read the article "faster approximate String Matching", Algorithmica (1999) 23:127-158.

Limitations of the algorithm: (m-k) * (k+2) <= 64, where m is the length of the substring. That 64 is because Oh, I used a 64-bit integer to encode the state of the automaton. If two errors are allowed, the substring is up to 18 characters long enough for the general application.

OK, cut the crap, look at the algorithm. Don't you understand? It's all right, oh, it's half understood.

char* Amatch (const char* text, const char* Pat, int k)
{
int m = strlen (PAT);
ASSERT (M-K&GT;0);
ASSERT ((m-k) * (k+2) <= 64);
Int J;
__int64 Din = 0;
__int64 M1 = 0;
__int64 M2 = 0;
__int64 M3 = 0;
__int64 G = 1 << k;
int onekp1 = (1 << (k+1))-1;
For (j=0 j<m-k; j + +)
{
Din = (din << (k+2)) |onekp1;
M1 = (M1 << (k+2)) |1;
if (J < m-k-1)
M2 = (M2 << (k+2)) | 1;
}
M2= (m2<< (k+2)) |onekp1;
__int64 D=din;
Const char* S=text;
int c=*s++;
while (c)
{
int found=0;
Const char* Sp=pat;
for (j=0;j<k+1;j++)
{
int cp=*sp++;
if (C==CP)
{
found=1;
Break
}
}
if (found)
{
Todo
{
__int64 TC = 0;
CONST char* SP = Pat;
For (j=0 j<m; j + +)
{
int cp = *sp++;
if (C!=CP)
C|= (1&LT;&LT;J);
}
__int64 Tc = 0;
For (j=0 j<m-k; j + +)
Tc = (tc<< (k+2)) | ((tc>>j) &onekp1);
__int64 x = (d>> (k+2)) | Tc;
D= ((d<<1) | M1) & ((d<< (k+3)) | M2) & (((X+M1) ^x) >>1) &Din;
if ((D & G) = = 0)
Return (char*) s;
if (D!= Din)
c = *s++;
}
while (D!= Din && c);
}
if (c)
c = *s++;
}
return NULL;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.