Algorithm series--string matching naïve algorithm and KMP algorithm __ algorithm

Source: Internet
Author: User

The string matching algorithm is mainly two kinds, the most basic brute force solution, also is called the naïve algorithm, the other is the KMP algorithm. This article gives the simplest method of two algorithms, easy to remember and write in the interview, of course, the emphasis is still to understand the algorithm thought. Simple Matching algorithm

The string being searched is called the main string, and the string to be searched is called a pattern string. The basic idea of naive pattern matching algorithm:

Starts with each character of the main string as a substring and matches the pattern string. The main string is greatly cycled, and each character starts with a small loop of pattern string lengths until the match succeeds or all traversal completes.

The Java implementation algorithm is as follows:

     /**
     * Naïve matching algorithm, find pattern string T, the first occurrence of the main string s in the position
     * *
     @param s main String *
     @param t mode string
     * @return First matching position if no return-1
     * * public
    int Simple (string s, String t) {
        char[] s = S.tochararray ();
        char[] T = T.tochararray ();
        int lenS = s.length;
        int lent = T.length;
        int I, j;//i traversal s,j traversal T for
        (i = 0; I <= lens-lent; i++) {for
            (j = 0; J < Lent; J +)
                if (s[i + j)!= T[J]) break
                    ;
            if (j = = lent) return
                i;
        }
        return-1;
    }
Analysis of time complexity

The time complexity analysis of naive pattern matching algorithm is as follows: (n is the main string length, M is the pattern string length)

situation Complexity of Time Notes
Best Case Scenario O (1) The match was successful at first.
Worst case scenario O ((n-m+1) *m) Each unsuccessful match occurs at the last character of the pattern string.
Average situation O (N+M) According to the equal probability principle, the average is (n+m)/2 times Lookup

The spatial complexity of the naive matching algorithm is O (1). KMP Algorithm

KMP algorithm full name is called Knuth-morris-pratt algorithm.

The string being searched is called the main string, and the string to be searched is called a pattern string.

We know that the naïve pattern matching algorithm, KMP algorithm from the pattern string, discovering the hidden information in the pattern string to reduce the number of comparisons.

The key to the KMP algorithm is the derivation of the next array value. Next Array Solution

For the pattern string T and its corresponding next array, Next[i] represents the longest public prefix of a string that is composed of a [0,i] interval. In fact, this paper uses the dynamic programming method to solve the idea.
When I=0 next[i]=0 when T[i]=t[next[i-1], next[i]=next[i-1]+1. This recursive relationship is really hard to understand, for instance,
Suppose t= "Abcabc" if we seek out the next[4]=2, this time seek next[5],
Next[5]=next[4]+1 can be obtained only if the current character equals the third character C
That is, when T[5]==t[next[4]], next[5]=next[4]+1=3. If let oneself write this recursion relation, estimate to want to break head also rare come out, great God after all great God, as mortal we still try to remember this recursion relation is good.

Next array
 //dynamic programming method for T-string to find the longest common prefix public
    int[] GetNext (String t) {
        char[] t = T.tochararray () for the [0,i] string;
        int len = t.length;
        int[] Next = new Int[len];
        Next[i]=k represents the longest public prefix length of the string [0,i] (k
        next[0] = 0;
        I traverse [1,len-1] for
        (int i = 1; i < Len; i++) {//
            This condition remembers
            //"ABCABC" next[4]=2 if (t[5]=t[next[4)]) next[5] =next[4]+1
            if (t[i] = = T[next[i-1]]) {
                next[i] = next[i-1] + 1
            ;
        }
        } return next;
    }
KMP Algorithm

The following is the Java implementation of the KMP algorithm, compared with the simple algorithm, KMP efficient in avoiding the index I in the main string of backtracking, I is constantly forward progressive. Therefore, the matching efficiency can be significantly improved when the main string s and the pattern string T exist a large number of partial matches.

The most streamlined KMP
    //= Average time complexity is O (m+n) space complexity O (n) public
    int KMP (string s, String t) {

        char[] s = S.tochararray ();
        char[] T = T.tochararray ();
        int lenS = s.length;
        int lent = T.length;
        int[] Next = getNext (t);
        The next array of print (T + "is", next);
        I keep moving forward 1, J move based on next array for
        (int i = 0, j = 0; i < LenS; ++i) {while
            (J > 0 && s[i]!= t[j])
            j = next[j-1] where J moves to the Next[j-1 value when the match is lost
                ;
            if (s[i] = = T[j]) {
                j + +;
            }
            if (j = = lent) {
                //returns index return
                i-j + 1;
            }
        }
        return-1;
    }
Analysis of time complexity

KMP The time complexity of the entire algorithm is O (n+m), which is better than the worst case of the naïve pattern matching algorithm O (n-m+1) *m. The
Space complexity is O (m), which consumes a one-dimensional array of pattern string lengths. The
KMP algorithm is more efficient only when there are many partial matches between the main string and the pattern string, because it reduces the number of unnecessary backtracking, but in other cases the efficiency and simplicity of the algorithm do not differ significantly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.