The approximate matching of strings is to allow a certain amount of error in the match, such as in the string "before the master long time no see" to find "before is a master" can also be successful. Specifically, there are three types of errors: Added characters (formerly Masters), leaky characters (formerly Masters), and replacement characters (formerly plaster Hands). The following function finds the substring pat in text with a maximum of K errors allowed. Return is the matching end point (I have not yet figured out how to determine the starting point, hehe).
As for the principle of the algorithm, now suddenly say not clear, can only say that this is a non-deterministic finite automaton, later have time to detail. If you are interested, you can read the article "faster approximate String Matching", Algorithmica (1999) 23:127-158.
Limitations of the algorithm: (m-k) * (k+2) <= 64, where m is the length of the substring. That 64 is because Oh, I used a 64-bit integer to encode the state of the automaton. If two errors are allowed, the substring is up to 18 characters long enough for the general application.
OK, cut the crap, look at the algorithm. Don't you understand? It's all right, oh, it's half understood.
char* Amatch (const char* text, const char* Pat, int k)
{
int m = strlen (PAT);
ASSERT (M-K>0);
ASSERT ((m-k) * (k+2) <= 64);
Int J;
__int64 Din = 0;
__int64 M1 = 0;
__int64 M2 = 0;
__int64 M3 = 0;
__int64 G = 1 << k;
int onekp1 = (1 << (k+1))-1;
For (j=0 j<m-k; j + +)
{
Din = (din << (k+2)) |onekp1;
M1 = (M1 << (k+2)) |1;
if (J < m-k-1)
M2 = (M2 << (k+2)) | 1;
}
M2= (m2<< (k+2)) |onekp1;
__int64 D=din;
Const char* S=text;
int c=*s++;
while (c)
{
int found=0;
Const char* Sp=pat;
for (j=0;j<k+1;j++)
{
int cp=*sp++;
if (C==CP)
{
found=1;
Break
}
}
if (found)
{
Todo
{
__int64 TC = 0;
CONST char* SP = Pat;
For (j=0 j<m; j + +)
{
int cp = *sp++;
if (C!=CP)
C|= (1<<J);
}
__int64 Tc = 0;
For (j=0 j<m-k; j + +)
Tc = (tc<< (k+2)) | ((tc>>j) &onekp1);
__int64 x = (d>> (k+2)) | Tc;
D= ((d<<1) | M1) & ((d<< (k+3)) | M2) & (((X+M1) ^x) >>1) &Din;
if ((D & G) = = 0)
Return (char*) s;
if (D!= Din)
c = *s++;
}
while (D!= Din && c);
}
if (c)
c = *s++;
}
return NULL;
}