Find the shortest substring that meets the criteria--sliding WINDOW

Source: Internet
Author: User
Introduction

Iterate through a string with a retractable window, with a time complexity of approximately o (n). Applies to "Finding the most small string of strings" that meet certain criteria.

Topics

Link

The substring of the substring that contains all the characters of a string s in a string T. Returns "" If it does not exist.

algorithm

Maintain a window with left and right two pointers.

    1. Move the right pointer right until the window satisfies the condition and contains all the characters in S.
    2. Move the left pointer to the left until the window no longer satisfies the condition. The least-boy string is updated every time this procedure is moved.
    3. Repeat steps 1, 22.
Why IT WORKS

Imagine how a naive algorithm iterates through all the substrings in T. With each character in T as the starting character of the substring, starting with 1, the length of the substring is increased until the end of the T is touched, so that all substrings in T are traversed.

For example, the string "ABCD", the substring that begins with ' a ' has "a", "AB", "ABC", "ABCD", "B", "BC", "BCD" beginning with ' C ', "C", "CD" with "D", and "D". The time complexity of this traversal is O (n^2).

We focus on the starting character and look at the utility of the sliding window.

The first step in the sliding window algorithm is based on the character X, which is equivalent to the starting character of x , looking for a substring that satisfies the condition. Due to the shortest substring in the title, it can be stopped once the condition is met, and no further search is required, which is equivalent to saving a part of the calculation force.

Assume that the substring found in the first step ends with a character y, and that the length of the substring x to Y is M. The length of the substring answer that was found is <=m so far. (assuming that there are other elements before X, 1, 2 steps have been repeated for several rounds)

In the second step, the window is shrunk by moving the left pointer. Assuming that the left pointer reaches element Z, the window no longer satisfies the condition. As the left pointer moves, the elements in the (x,z) open interval are used as the starting character andy is traversed for the end character .

Fixing the end character at Y is an important optimization of the naive solution, and contains the main mathematical principle that the sliding window algorithm can correctly find the answer:

For an element t between X and Z, the least-boy string that meets the criteria with T as the starting character must end at y .

Proof: The window is shrunk to the left of Z, the string of T to Y is guaranteed to satisfy the condition, the T to Y is not the smallest substring, then there is a string that starts from T to character R satisfies the condition, and R is on the left of Y, then the X-to-R string will satisfy the condition, and the conclusion from the first step

Because of this principle, the elements between x and Z are traversed only by shrinking the left edge of the window. The complexity of time is changed from square to linear.

In the second step, all substrings within the [X, z) interval are traversed with all the substring of the starting character . The first step in the next round is to look for the z as the starting character . In this way, as the window alternately stretches and shrinks, all possibilities (that is, substrings with all elements as starting characters) will be traversed.

Implementation

The above analysis determines the approximate framework of the sliding window algorithm. As for how to record the state of the window, determine whether the window satisfies the condition, the problem dug a small pit.

At first glance, it seems that you can use HashSet to save the characters in T (called important characters ) to see if there is a character in T. Use another hashset to record the important characters in the window and use a counter to record the number of important characters in the window, if the length of the T is equal to the condition. Looks seamless, but if there are repeating characters in T, such as "AABCC", the method is no longer valid.

A small improvement can be made to this method to conform to test instructions: HashMap to preserve important characters and the number of occurrences. If T is "AABCC", it is saved as [A--2, B--1,c--2]. Use a HashMap to record important characters and numbers in the window, and use counter to record the number of important characters that are not duplicated in the window. If a appears 2 times then counter can add 1,b appear 1 times counter can add 1, similarly, C must appear 2 times counter can add 1. Determines whether a window satisfies a condition by comparing the value of counter to the size of the first hashmap.

When writing code, it is time-consuming and error-prone to think in terms of a sentence as a unit , especially on boundary conditions. A more reliable way is to write an approximate framework and then fill in the details. As long as the framework is reasonable, the code is generally wrong.

Use annotations to outline the approximate framework. (Can be viewed as a flowchart, it is important that the two while internal arrangements)

 Publicstring Minwindow (string s, String t) {//Create a HashMap1 to deposit characters and occurrences in T//Initialize window, window HashMap2, counter//Create minlength record the length of the minimum string; Create result save the currently found minimum string         while(/*The right end of the window does not exceed S*/) {        //record the elements of the right boundary to the HASHMAP2//if the number of elements satisfies the condition, ++counter//If the window satisfies the condition, let the left edge shrink slowly, otherwise skip the while and continue stretching the right boundary .         while(/*counter = = Hashmap2.size ()*/) {            //If the window length is less than minlength, update minlength, result//to shrink the left boundary, subtract 1 from the left boundary element recorded in HashMap2//if the number of left boundary elements no longer satisfies the condition,--counterl++;//Shrink left Border} R++;//Stretch right Border    }    returnresult;}

If you understand the above framework is not difficult to fill in the details, the details of the implementation below, for reference. (Note: This is a correct solution, but not the optimal solution, see the Optimization section)

1  Publicstring Minwindow (string s, String t) {2     if(s = =NULL|| t==NULL|| T.length () = = 0 | | S.length () = = 0)3         returns;4     5     //Create HashMap16Hashmap<character, integer> required =NewHashmap<>();7     //Initialize window, window HashMap2, counter8Hashmap<character, integer> contained =NewHashmap<>();9     intL = 0, R = 0, counter = 0;Ten     //Create minlength record the length of the minimum string; Create result save the currently found minimum string One     intMinLength =Integer.max_value; AString result = ""; -  -     //deposit characters and occurrences in T the      for(inti = 0; I < t.length (); i++) { -         intCount = Required.getordefault (T.charat (i), 0); -Required.put (T.charat (i), Count + 1);  -     } +      -      while(R < S.length ()/*The right end of the window does not exceed S*/) { +         CharCurrent =S.charat (r); A         if(Required.containskey (current)) { at             //record the elements of the right boundary to the HASHMAP2 -             intCount = Contained.getordefault (current, 0); -Contained.put (Current, Count + 1); -             //if the number of elements satisfies the condition, ++counter -             if(Contained.get (current). Intvalue () = =Required.get (current). Intvalue ()) -++counter; in         } -          to         //If the window satisfies the condition, let the left edge shrink slowly, otherwise skip the while and continue stretching the right boundary . +          while(Counter = = Required.size ()/*counter = = Hashmap2.size ()*/) { -             //If the window length is less than minlength, update minlength, result the             if(R-l + 1 <minLength) { *result = S.substring (l, R + 1); $MinLength = r-l + 1;Panax Notoginseng             } -             CharTodelete =S.charat (l); the             if(Required.containskey (Todelete)) { +                 //to shrink the left boundary, subtract 1 from the left boundary element recorded in HashMap2 AContained.put (Todelete, Contained.get (todelete)-1); the                 //if the number of left boundary elements no longer satisfies the condition,--counter +                 if(Contained.get (Todelete). Intvalue () = = Required.get (todelete). Intvalue ()-1) ---counter; $             } $l++;//Shrink left Border -         } -r++;//Stretch right Border the     } -     returnresult;Wuyi}
Algorithm Implementation

Note in rows 27 and 43, when you compare the value of an integer, you must compare it with . Intvalue () , otherwise the address of the integer object is compared. When the value of an integer object is small, the object exists in a constant pool and is directly compared with contained.get (current) = = required.get (current) without error. However, an error occurs when the integer value is larger and cannot be placed into the constant pool, causing counter to never be updated and incorrectly returning an empty string.

Complexity of

Two hashmap are used in space, the complexity is O (n + M), and N and M are the lengths of S and T respectively.

Time, the sliding window algorithm itself contains the left and right two pointers, both of which are only moved to the right, the worst case is that each element is traversed by two pointers, so the time to slide the window is 2n. The total time complexity is O (n + M), since the characters are also historically recorded on T.

Optimization

In the code engine used by Leetcode, the implementation time of the above implementations is 33MS, ranking only 77% in all Java implementations.

The optimal implementation is 2ms, very concise, transcribed as follows

1 classSolution {2      Publicstring Minwindow (string s, String t) {3         int[] Map =New int[128];4          for(CharC:t.tochararray ())5map[c]++;6         intCounter = T.length (), begin = 0, end = 0, distance = integer.max_value, head = 0;7          while(End <s.length ()) {8             if(Map[s.charat (end++)]--> 0)9counter--;Ten              while(counter = = 0) {//valid One                 if(End-begin <distance) ADistance = end-(head =begin); -                 if(Map[s.charat (begin++)]++ = = 0) -counter++;//Make It invalid the             } -         } -         returnDistance = = Integer.max_value? "": s.substring (head, head +distance); -     } +}

The approximate framework is similar to the above implementation, with the following optimization points:

    1. Use arrays instead of HashMap to access characters, because there is no need to calculate hashes and traverse elements in buckets, performance improves
    2. The subtraction is done directly on the basis of the original array, thus eliminating the need for a second HashMap, and without the invocation of the ContainsKey () method, which is simple and efficient .
    3. Each time a substring is updated , the value of the string is not calculated, only the head and distance are recorded, and the substring () method is called only once at the last

Another optimization idea is to record the position of all the important elements in the first pass, and then L and R move only in those positions. The complexity of the sliding window itself is reduced by the fact that it still needs to traverse and the time complexity is still O (n + m). This method of testing in the Leetcode test case for performance improvement results is not obvious, probably in the second level. The comparison applies to cases where the number of important elements in S is much smaller than the length of s, that is, the length of T is relatively short, and S contains many elements that are not in T.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.