String mode matching -- Shift-and shift-or Algorithm

Source: Internet
Author: User

The shift-and algorithm is simpler than the KMP algorithm. Set the mode string to P, it mainly stores a set D (D records all prefixes in P that match a suffix of the currently read text). Every time a new character is read in text, the algorithm uses the bit parallel mechanism to update this set.D.

If the length of P is m, the set D can be expressed as D = DM... D1 and D [J] stands for DJ

D [J] = 1 When and only when P1... PJ is T1... A suffix of TI. When d [m] = 1, P is considered to have matched with text.

When the next character Ti + 1 is read, the new set d'' needs to be calculated ′. if and only when d [J] = 1 and Ti + 1 is equal to PJ + 1, d' [J + 1] = 1. this is because D [J] = 1 has P1... PJ is T1... A suffix of TI. When Ti + 1 is equal to PJ + 1, P1... PJ + 1 is T1... A suffix of Ti + 1. This set can be updated through bitwise operations.

The algorithm first creates an arrayB, The array length is the character set length of the text string (for example, the length of array B is 26 for the A-Z .) if the J of P is equal to C, the J position in B [c] is 1.

It is not cost-effective to perform preprocessing and computation on B if the character set is large. If M is too long (larger than the machine font), it is inconvenient. Therefore, this algorithm is applicable when the character set is small and the mode string is smaller than the machine Character length. Of course, for long mode strings, It is faster than brute force, but the logic is more complicated.


  Shift-andThe Code is as follows. Assume that the character set size is 128.


Int shift_and (char * s, int len_s, char * P, int len_p)
{
Int B [128];
Memset (B, 0, sizeof (B ));

Int I;
For (I = 0; I <len_p; I ++)
B [p [I] | = 1 <I;

Int D = 0;
For (I = 0; I <len_s; I ++)
{
D = (d <1) | 1) & B [s [I]; // d <1 and 1 bits or operations, enables matching to start from the current character at any time, and uses bitwise operations to achieve parallel operation.
If (D & (1 <(len_p-1 )))
Return I-len_p + 1;
}
Return-1;
}

Shift-orAlgorithmThe concept of shift-and is the same as that of the shift-and algorithm. It only reduces the number of bit operations and increases the speed. Shift-or is modified to indicate that a number is in the set with zero, and 1 indicates that it is not, so

D = (d <1) | 1) & B [s [I];

Change to d = d <1 | B [s [I]; saves a bit operation. Of course, you must modify the values of B and D during initialization.




========================================================== ======================================


My code:


Shift-and



Int size = 128; // The default character set size is 128.

// Preprocessing,
Void preshiftand (const char * P, int M, unsigned int * s ){
For (INT I = 0; I <size; I ++)
S [I] = 0;
For (INT I = 0; I <m; I ++ ){
S [p [I] | = 1 <I;
}
}

// Shift-and
Int shiftand (const char * t, const char * P ){
Int tlen = strlen (t );
Int Plen = strlen (P );
Unsigned int state = 0; // that is, the D array
Unsigned int s [size];

If (tlen <Plen) Return-1;

Preshiftand (p, Plen, S); // preprocessing

For (INT I = 0; I <tlen; I ++ ){
State = (State <1) | 1) & S [T [I];
If (State & 1 <(pLen-1) // The highest bit has zero
Return I-plen + 1;
}
Return-1;

}



Shift-or


# Define wordsize sizeof (INT) * 8
# Define asize 256 // only ASCII character set is considered

Int preso (const char * X, int M, unsigned int s []) {
Unsigned Int J, Lim;
Int I;
For (I = 0; I <asize; ++ I)
S [I] = ~ 0; // The initial values are all 1 binary values.
For (lim = I = 0, j = 1; I <m; ++ I, j <= 1 ){
S [x [I ~ J; // when the J-TH is I, S [I] [J] = 0;
Lim | = J;
}
/*
For (I = 0; I <m; I ++ ){
Cout <X [I] <"feature" <bitset <sizeof (INT) * 8> (s [x [I]) <Endl;
}
*/
Lim = ~ (Lim> 1 );
Return (lim );
}

Int so (const char * X, int M, const char * y, int N ){
Unsigned int Lim, State;
Unsigned int s [asize];
Int J;
If (M> wordsize ){
Cout <"so: use pattern size <= word size ";
Return-1;
}

/* Preprocessing */
Lim = preso (x, M, S );

/* Searching */
For (State = ~ 0, j = 0; j <n; ++ J ){
State = (State <1) | s [Y [J];
If (State <Lim)
Return J-m + 1;
}
Return-1;
}


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.