Algorithmic sorting + template ②: string processing

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This week's training game out a KMP template problem, but because of a long time did not review the string processing algorithm, but also did not thoroughly understand the study, but probably understand the idea, so that the game was slow to make this problem, the final scene took out the school finishing materials on-site re-study only ac this problem. Take this opportunity to tidy up common string processing algorithms and templates.

String processing in the game is generally not particularly difficult (at least I met not), and some string processing will be put together with DP to the question, to increase some difficulty, and simple string processing is actually relatively good writing.

First, STRSTR

The STRSTR (STR1,STR2) function is used to determine whether a string str2 is a substring of str1. If it is, the function returns the address that str2 first appears in str1, or null if it is not. It is said that strstr efficiency and KMP are similar.

Strstr Nothing to say, note that the return address, if you want to subscript the return value minus the first address of the array.

Example: http://acm.fzu.edu.cn/problem.php?pid=2128

Analysis: Find out the location of all the substrings, and then find the distance between the second and second letter of the next two substrings, and maintain the maximum.

Code:

#include <cstdio>#include<iostream>#include<algorithm>#include<cstring>#defineMAXN 1000010using namespacestd;Charstr[maxn],tmp[ the];structnode{intStart,endn;} NO[MAXN];intCnt=0;BOOLCMP (node A,node b) {returna.start<B.start;}intMain () { while(~SCANF ("%s", str)) {        intN; scanf ("%d",&N); CNT=0; intlen=strlen (str); intres=-1;  while(n--) {scanf ("%s", TMP); intpos=0; intltmp=strlen (TMP);  while(STRSTR (str+pos,tmp)! =NULL) {                intANS=STRSTR (STR+POS,TMP)-str; No[cnt].start=ans; No[cnt].endn=ans+ltmp-1; POS=No[cnt].endn; CNT++; }} No[cnt].start=no[cnt].endn=Len; CNT++; Sort (No,no+cnt,cmp); /*for (int i=0;i<cnt;i++) cout<<no[i].start<< "" <<no[i].endn<<endl;*/         for(intI=0; i<cnt-1; i++) Res=no[i+1].endn-no[i].start-1>res?no[i+1].endn-no[i].start-1: Res; if(res==-1) printf ("%d\n", Len); Elseprintf"%d\n", RES); }}

STRSTR Applications

Second, string hash

A specific hash summary will open another topic, where only the template used for string hashing is pasted:

1.SDBMHash

int Sdbmhash (char *str) {    int0;      while (*str) {        //  equivalent To:hash = 65599*hash + (*str++);        6 ()- hash;    }     return 0x7FFFFFFF );}

2.BKDRHash

int Bkdrhash (char *str) {    int131//  131 1313 13131 131313 etc..    int0;      while (*str) {        = hash * seed + (*str++);    }     return 0x7FFFFFFF );}

3.APHash

UnsignedintAphash (Char*str) {unsignedinthash =0; inti;  for(i=0; *STR; i++){        if((I &1) ==0) {Hash^= (Hash <<7^ (*str++) ^ (Hash >>3)); } Else{Hash^= ((Hash << One^ (*str++) ^ (Hash >>5))); }    }    return(Hash &0x7FFFFFFF);}

3.kmp

KMP is the most commonly used string processing algorithm in ACM, although its efficiency may not be as good as SUNDAY,BM algorithm, but its status is not to be questioned.

The essence of KMP algorithm is to add the next array on the basis of brute force search, and the idea is to preprocess the pattern string first, then use the existing matching information to optimize the number of the pattern string moving in the search time.

As an example:

For example, the main string is: ASDFVAGBASDFGABSDFASDABCBDSFB, the pattern string is abcabd.

We first compare from left to right and find that the second bit does not match, at this time if it is the idea of violence, we should re-match with the second head of the main string, but in fact we do not need to do this, because we can find the next a in the main string to start the comparison.

What we have to do is to use the known information to find the next array, that is, how to optimize the query process.

As a result, we have introduced the concept of prefix-to-character channeling. A prefix is a substring that removes the last letter and contains the first letter, and the suffix refers to a string that removes the first letter and contains the substring of the trailing letter.

As an example:

For strings: Abcdabd:

-the prefix and suffix of "A" are empty, and the total element length is 0;

-the "AB" prefix is [A], the suffix is [B], the total element length is 0;

-the "ABC" prefix is [A, AB], the suffix is [BC, C], the length of the common element is 0;

-the "ABCD" prefix is [A, AB, ABC], suffix [BCD, CD, D], the length of the common element is 0;

-the "abcda" prefix is [A, AB, ABC, ABCD], the suffix is [bcda, CDA, DA, a], the common element is "a", the length is 1;

-"Abcdab" is prefixed with [A, AB, ABC, ABCD, abcda], suffix [Bcdab, Cdab, DAB, AB, B], the total element is "AB", the length is 2;

-"ABCDABD" is prefixed with [A, AB, ABC, ABCD, ABCDA, Abcdab], suffix [bcdabd, cdabd, Dabd, ABD, BD, D], with a total element length of 0.

As a matter of fact, we can also understand that the next array is stored in the same number of digits as it is to this one, and the same bits are counted from the head of the pattern string.

For example, abcdabd,next[6]==2, that is, to the sixth place, there is a two-bit for this substring exists both in the prefix, but also in the suffix, that is, in the string and the first repetition of the pattern string: AB.

And when we are doing KMP, the distance that the pattern string moves backwards is not a simple 1-bit, but rather a "moving bit = matched number of characters-corresponding partial match".

For example, for the pattern string abcdabd, the main string is ABCDABACDABCDABD, and the next array of the pattern string is preceded by the 0,0,0,0,1,2,0. First we match the seventh bit mismatch, we need to move backwards (matched 6 bits-the last match bit corresponding to the next value 2) = 4 bits, which greatly improves the efficiency.

Const intmaxn_t=1000010;Const intmaxn_p=10010;Charp[maxn_p],t[maxn_t];int_next[maxn_p];voidInit_next (Char*q) {    intM=strlen (p+1);//array subscript starting from 1_next[1]=0;//the first bit of the next array is 0 .     for(intk=0, q=2; q<=m;q++) {//q is the template string subscript, K is the maximum prefix length         while(k>0&& p[k+1]!=P[Q])//recursion for P-strings, the maximum length of the same prefixk=_next[k]; if(p[k+1]==P[Q])//if two bits are equal, the maximum same prefix length plus 1k++; _NEXT[Q]=K; }}

Find Next Array template

intKmpChar*p,Char*T) {    intN=strlen (t+1), M=strlen (p+1);    Init_next (P); intsum=0;  for(intI=1, q=0; i<=n;i++) {//I is the T-string subscript, Q is the P-string subscript         while(q>0&&p[q+1]!=t[i])//How many bits should be moved backwards depending on the maximum prefix lengthq=_next[q]; if(p[q+1]==t[i]) Q++; if(q==m) {//if the P-string matches to the last one and succeedssum++;//the Count plus 1Q=_NEXT[Q];//you can continue to query how many times p strings appear in the T-string}//if only the query exists can be directly in this return    }    returnsum;}

KMP Templates

The rest of the string processing algorithms such as automata, suffix array, bm,sunday, temporarily not too deep understanding, wait for time and opportunity, will again to fill up!

Algorithmic sorting + template ②: string processing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Algorithmic sorting + template ②: string processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Algorithmic sorting + template ②: string processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support