Summary of topics on primary string matching [KMP] [Manacher] [Tire Tree] [AC Automation]

Source: Internet
Author: User

The string matching question is refreshed for three days.

In order to continue to cut the question below, I would like to summarize some.

The most basic algorithm for string matching is enumeration (n ^ 2 ).

What is more advanced is KMP.

KMP Has been in the data structure class. Since the teacher is only responsible for teaching and not answering questions he doesn't understand, KMP is put aside. ACM has never understood these basic algorithms for so many years.

So I learned about KMP.

Let's talk about the main idea of KMP. KMP is used for pattern string matching.

Let's take a look at a string: (1) AACAACAAB;

The string to be searched is: (2) AACAAB;

First, sequential match:

(1) AACAACAAB

(2) AACAA

The matching is successful here. We can see with the naked eye that when (2) the matching continues, 'B' and 'C' do not match. So how to slide?

For string (2), we can find the character 'a' before 'B', which is the same as the two characters before 'C' of string (1, so you can slide like this ....

(1) AACAACAAB

(2) AA

Then we can continue matching and find that the exact match is ......

Okay. How can I slide? Construct a next array and record the moving subscript.

It can be illustrated in one sentence:

On the left side of the J character there are [0, I-1] and [J-I-1, J-1] the same words, next J mismatch, you can slide to I.

Because it is not matched when sliding to J, that is to say, all the strings before J are matched with the main string. Therefore, you only need to find the matched prefix and primary string (must be partially matched) in this string and select slide.

The following is the function for building next. T is the mode string

void setNext(){     int j=0,k=-1;     next[0]=-1;     while( j<lenT )     {            if( k==-1||t[j]==t[k] )                next[++j]=++k;            else                k=next[k];     }}

If the mismatch is not found, it will be rolled back.

The following is a KMP matching template.

int kmp(){    int i=0,j=0;    cnt=0;    while( i<lenS&&j<lenT )    {           if( j==-1 || s[i]==t[j] )               i++,j++;           else               j=next[j];    }    if( j>lenT )return i-lenT;    elsereturn -1;}

The matching continues, and the mismatch slides.

KMP is mainly used to solve the problem:

1. Position of the mode string in the main string

2. Number of times the mode string appears in the main string

3. How many mode strings are split into the main string?

4. Number of cycles in the prefix of the pattern string

The above is not well written by KMP ........


Now, Manacher;

This algorithm is mainly used to calculate the input string. The back-to-text string is used.

Assume that we have a text string with id as the center and p [id] as the radius of the text string with id as the center.

The following is a reply string:

Id p [id]

| <---- | -----> |

CABAAKAABAA

We can see that the input string is K at the center of ABAAKAABA.

Now, we use mx = id + p [id] to control the rightmost range of the id-centered input string.

Now let's look at the string 'B' ON THE RIGHT OF K '. This B is actually within mx, so it is still controlled by the id!

Therefore, the nature of B is related to B on the left of B's symmetric point about id. Why? Because it's a reply ~ Symmetric.

With the naked eye, p ['B'] = 1; so the left and right sides of B on the right are similar to B on the left.

But only when the right boundary of B on the right is still in mx.

Why?

See the following:

AABAAKAABAC

P [B] = 2 on the left side of the string; B on the right side is obviously not so broad as it is beyond the control range of mx.

Therefore, the control range of p [B] on the right is in the distance from the right boundary, and the minimum value can be obtained within the control range of the symmetric point.

However, for the original string, the actual range of B on the right can be expanded. So continue to expand.

After expansion, you will find that the rightmost boundary of the new input string exceeds mx, and the record is updated.

After scanning from left to right, retrieve the largest p [id. Just try again.

To avoid parity, insert the '#', '$' or other uncommon characters in the string.

#include<iostream>#include<cstdio>#include<string.h>using namespace std;int p[2222222];char str[1111111],str1[2222222];int len;void Init(){  str1[0]='$';  str1[1]='#';  len=2;  for( int i=0;str[i]!=0;i++ )  {    str1[len++]=str[i];    str1[len++]='#'; } str1[len]=0;}int main(){ int T=0; while( scanf("%s",&str)!=EOF ) {    if( strlen(str)==3 && str[0]=='E' && str[1]=='N' && str[2]=='D' )       break;    //memset( str1,0,sizeof(str1) );    memset( p,0,sizeof(p) );    Init();    int id,mx=0;    for( int i=1;i<len;i++ )    {   if( mx>i )   p[i]=min(p[(id<<1)-i],mx-i);   else   p[i]=1;   while( str1[i-p[i]]==str1[i+p[i]] )      p[i]++;    if( mx<i+p[i] );{ mx=i+p[i]; id=i; }      }      printf( "Case %d: ",++T );      int ans=0;      for( int i=1;i<len;i++ )      ans=max(p[i],ans);      printf( "%d\n",ans-1 );  } return 0;}

Okay, let's continue here =

TireTree is a dictionary tree and a letter tree. Starting from the root node, each node represents a letter, and the K letters of the word are on the K layer of the tree.

This is just a data structure. Implementation is not difficult. But it is the foundation of the suffix tree and the AC automatic machine.

Not to mention, simply add the template.

#include<iostream>#include<string>#include<cstdio>#define MAX 10using namespace std;char s[11111][11];int allocp;struct TireNode{       int nCount;       TireNode *next[MAX];};TireNode Memeroy[1111111];void InitTire( TireNode **root ){     *root=NULL;}TireNode *CreateTire(){         int i;         TireNode *p=&Memeroy[allocp++];         p->nCount=1;         for( int i=0;i<MAX;i++ )              p->next[i]=NULL;         return p;}void InsertTire( TireNode **root,char *s ){     int i=0,k;     TireNode *p;     if( !(p=*root) )         p=*root=CreateTire();          while( s[i] )     {            k=s[i++]-'0';            if( p->next[k] )                p->next[k]->nCount++;            else                p->next[k]=CreateTire();            p=p->next[k];      }}bool SearchTire( TireNode **root,char *s ){     int i=0,k;     TireNode *p=*root;     int cnt=0;      while( s[i] )     {            k=s[i++]-'0';            cnt=p->next[k]->nCount;             p=p->next[k];         }     if( cnt==1 )         return false;     else         return true; }int main(){    int T;    scanf( "%d",&T );    while( T-- )    {           allocp=0;           TireNode *root;           root=NULL;           int len=0;           scanf( "%d",&len );            for( int i=0;i<len;i++ )           {                scanf( "%s",&s[i] );                InsertTire(&root,s[i]);           }           bool found=true;           for( int i=0;i<len;i++ )           {                if( SearchTire(&root,s[i]) )                {    found=false;                    break;}           }           if( found==false )               printf( "NO\n" );           else               printf( "YES\n" );     }    return 0;}

Tired .... Write the AC automatic machine tomorrow...

It can be understood that the AC automatic mechanism is to perform KMP on a dictionary tree ......

Templates .......

#include<iostream>#include<cstdio>#include<string.h>#define MAX 26using namespace std;int root,tot;struct node{       int fail;       int cnt;       int next[MAX];       void init()   {            memset( next,0,sizeof(next) );            fail=-1;cnt=0;       }}Tire[5555555];int queue[5555555];void init(){     root=tot=0;     Tire[root].init();}void insert( int root,char *s ){     int p=root;     int i=0,k;     while( s[i] ) {            k=s[i++]-'a';            if( !Tire[p].next[k] )            {                Tire[++tot].init();                Tire[p].next[k]=tot;            }            p=Tire[p].next[k];     }     Tire[p].cnt++;}void build_ac_automation(){     int head,tail;     head=tail=0;     queue[tail++]=root;     while( head<tail ) {            int cur=queue[head++];            for( int i=0;i<MAX;i++ ){                 if( Tire[cur].next[i] ) {                     int son=Tire[cur].next[i];                     int p=Tire[cur].fail;                     if( cur==root )                         Tire[son].fail=root;                     else                         Tire[son].fail=Tire[p].next[i];                     queue[tail++]=son;                 }                 else {                     int p=Tire[cur].fail;                     if( cur==root )                         Tire[cur].next[i]=root;                     else                         Tire[cur].next[i]=Tire[p].next[i];                 }            }     }}int query( char *s ){    int i=0,k,p=root;    int ret=0;    while( s[i] )    {           k=s[i++]-'a';           while( !Tire[p].next[k]&&p!=root )                 p=Tire[p].fail;           p=Tire[p].next[k];           if(p==-1)p=0;           int temp=p;           while( temp!=root&&Tire[temp].cnt!=-1 )           {                  ret+=Tire[temp].cnt;                  Tire[temp].cnt=-1;                  //sTire[temp].cnt=0;                  temp=Tire[temp].fail;           }    }    return ret;}char str[1111111];int main(){    int T;    scanf( "%d",&T );    while( T-- ){           init();           int N;           scanf( "%d",&N );           while( N-- )           {                 scanf( "%s",&str );                 insert( root,str );           }           build_ac_automation();           scanf( "%s",&str );           printf( "%d\n",query(str) );           //system("pause");    }    return 0;}










Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.