This week's training game out a KMP template problem, but because of a long time did not review the string processing algorithm, but also did not thoroughly understand the study, but probably understand the idea, so that the game was slow to make this problem, the final scene took out the school finishing materials on-site re-study only ac this problem. Take this opportunity to tidy up common string processing algorithms and templates.
String processing in the game is generally not particularly difficult (at least I met not), and some string processing will be put together with DP to the question, to increase some difficulty, and simple string processing is actually relatively good writing.
First, STRSTR
The STRSTR (STR1,STR2) function is used to determine whether a string str2 is a substring of str1. If it is, the function returns the address that str2 first appears in str1, or null if it is not. It is said that strstr efficiency and KMP are similar.
Strstr Nothing to say, note that the return address, if you want to subscript the return value minus the first address of the array.
Example: http://acm.fzu.edu.cn/problem.php?pid=2128
Analysis: Find out the location of all the substrings, and then find the distance between the second and second letter of the next two substrings, and maintain the maximum.
Code:
#include <cstdio>#include<iostream>#include<algorithm>#include<cstring>#defineMAXN 1000010using namespacestd;Charstr[maxn],tmp[ the];structnode{intStart,endn;} NO[MAXN];intCnt=0;BOOLCMP (node A,node b) {returna.start<B.start;}intMain () { while(~SCANF ("%s", str)) { intN; scanf ("%d",&N); CNT=0; intlen=strlen (str); intres=-1; while(n--) {scanf ("%s", TMP); intpos=0; intltmp=strlen (TMP); while(STRSTR (str+pos,tmp)! =NULL) { intANS=STRSTR (STR+POS,TMP)-str; No[cnt].start=ans; No[cnt].endn=ans+ltmp-1; POS=No[cnt].endn; CNT++; }} No[cnt].start=no[cnt].endn=Len; CNT++; Sort (No,no+cnt,cmp); /*for (int i=0;i<cnt;i++) cout<<no[i].start<< "" <<no[i].endn<<endl;*/ for(intI=0; i<cnt-1; i++) Res=no[i+1].endn-no[i].start-1>res?no[i+1].endn-no[i].start-1: Res; if(res==-1) printf ("%d\n", Len); Elseprintf"%d\n", RES); }}
STRSTR Applications
Second, string hash
A specific hash summary will open another topic, where only the template used for string hashing is pasted:
1.SDBMHash
int Sdbmhash (char *str) { int0; while (*str) { // equivalent To:hash = 65599*hash + (*str++); 6 ()- hash; } return 0x7FFFFFFF );}
2.BKDRHash
int Bkdrhash (char *str) { int131// 131 1313 13131 131313 etc.. int0; while (*str) { = hash * seed + (*str++); } return 0x7FFFFFFF );}
3.APHash
UnsignedintAphash (Char*str) {unsignedinthash =0; inti; for(i=0; *STR; i++){ if((I &1) ==0) {Hash^= (Hash <<7^ (*str++) ^ (Hash >>3)); } Else{Hash^= ((Hash << One^ (*str++) ^ (Hash >>5))); } } return(Hash &0x7FFFFFFF);}
3.kmp
KMP is the most commonly used string processing algorithm in ACM, although its efficiency may not be as good as SUNDAY,BM algorithm, but its status is not to be questioned.
The essence of KMP algorithm is to add the next array on the basis of brute force search, and the idea is to preprocess the pattern string first, then use the existing matching information to optimize the number of the pattern string moving in the search time.
As an example:
For example, the main string is: ASDFVAGBASDFGABSDFASDABCBDSFB, the pattern string is abcabd.
We first compare from left to right and find that the second bit does not match, at this time if it is the idea of violence, we should re-match with the second head of the main string, but in fact we do not need to do this, because we can find the next a in the main string to start the comparison.
What we have to do is to use the known information to find the next array, that is, how to optimize the query process.
As a result, we have introduced the concept of prefix-to-character channeling. A prefix is a substring that removes the last letter and contains the first letter, and the suffix refers to a string that removes the first letter and contains the substring of the trailing letter.
As an example:
For strings: Abcdabd:
-the prefix and suffix of "A" are empty, and the total element length is 0;
-the "AB" prefix is [A], the suffix is [B], the total element length is 0;
-the "ABC" prefix is [A, AB], the suffix is [BC, C], the length of the common element is 0;
-the "ABCD" prefix is [A, AB, ABC], suffix [BCD, CD, D], the length of the common element is 0;
-the "abcda" prefix is [A, AB, ABC, ABCD], the suffix is [bcda, CDA, DA, a], the common element is "a", the length is 1;
-"Abcdab" is prefixed with [A, AB, ABC, ABCD, abcda], suffix [Bcdab, Cdab, DAB, AB, B], the total element is "AB", the length is 2;
-"ABCDABD" is prefixed with [A, AB, ABC, ABCD, ABCDA, Abcdab], suffix [bcdabd, cdabd, Dabd, ABD, BD, D], with a total element length of 0.
As a matter of fact, we can also understand that the next array is stored in the same number of digits as it is to this one, and the same bits are counted from the head of the pattern string.
For example, abcdabd,next[6]==2, that is, to the sixth place, there is a two-bit for this substring exists both in the prefix, but also in the suffix, that is, in the string and the first repetition of the pattern string: AB.
And when we are doing KMP, the distance that the pattern string moves backwards is not a simple 1-bit, but rather a "moving bit = matched number of characters-corresponding partial match".
For example, for the pattern string abcdabd, the main string is ABCDABACDABCDABD, and the next array of the pattern string is preceded by the 0,0,0,0,1,2,0. First we match the seventh bit mismatch, we need to move backwards (matched 6 bits-the last match bit corresponding to the next value 2) = 4 bits, which greatly improves the efficiency.
Const intmaxn_t=1000010;Const intmaxn_p=10010;Charp[maxn_p],t[maxn_t];int_next[maxn_p];voidInit_next (Char*q) { intM=strlen (p+1);//array subscript starting from 1_next[1]=0;//the first bit of the next array is 0 . for(intk=0, q=2; q<=m;q++) {//q is the template string subscript, K is the maximum prefix length while(k>0&& p[k+1]!=P[Q])//recursion for P-strings, the maximum length of the same prefixk=_next[k]; if(p[k+1]==P[Q])//if two bits are equal, the maximum same prefix length plus 1k++; _NEXT[Q]=K; }}
Find Next Array template
intKmpChar*p,Char*T) { intN=strlen (t+1), M=strlen (p+1); Init_next (P); intsum=0; for(intI=1, q=0; i<=n;i++) {//I is the T-string subscript, Q is the P-string subscript while(q>0&&p[q+1]!=t[i])//How many bits should be moved backwards depending on the maximum prefix lengthq=_next[q]; if(p[q+1]==t[i]) Q++; if(q==m) {//if the P-string matches to the last one and succeedssum++;//the Count plus 1Q=_NEXT[Q];//you can continue to query how many times p strings appear in the T-string}//if only the query exists can be directly in this return } returnsum;}
KMP Templates
The rest of the string processing algorithms such as automata, suffix array, bm,sunday, temporarily not too deep understanding, wait for time and opportunity, will again to fill up!
Algorithmic sorting + template ②: string processing