Leetcode | Implement strStr () | Implement the string search function
Returns the index of the first occurrence of needle in haystack, or-1 if needle is not part of haystack.
For example:Haystack = "bcbcda"; needle = "bcd", return 2
Resolution: String lookup function. The strstr () function is used to retrieve the position where the substring first appears in the string. Its prototype is:
char *strstr( char *str, char * substr );
Train of Thought 1: easy to implement, but it doesn't matter (the time complexity does not meet the requirements)
Two pointers: I points to the starting point of the haystack, and j points to the starting point of the needle. First I goes backward until the haystack [I] = needle [j]; then j goes backward, if haystack [I + j]! = Needle [j] jumps out. If the m step is followed, that is, the same exists, and I is returned. If there is no match, the haystack moves back and then compares it from needle [0 ].
The principle is:Compare them one by one on haystack with the needle string; You can compare up to m times each time and repeat up to n times;
HenceThe time complexity is O (m * n), Cannot meet the leetcode time requirements
Note: clarify your ideas before writing code,
1. Identify the problem-solving algorithm
2. determine the time-space complexity of the algorithm and check whether the interviewer needs the time-space complexity.
3. What special situations need to be handled?
You must clarify your thinking before writing code.
Int strStr2 (string haystack, string needle) {// time complexity O (m * n), cannot meet the leetcode time requirement int m = needle. size (); int n = haystack. size (); if (m = 0) return 0; if (m> n) return-1; for (int I = 0; I <n; I ++) {int j = 0; if (haystack [I] = needle [j]) {for (; j <m & I + j <n; j ++) {if (needle [j]! = Haystack [I + j]) break;} if (j = m) return I;} return-1 ;}
Train of Thought 2 Rabin-Karp algorithm-Hash search
Rabin-Karp algorithm: it is an algorithm used in computer science to search for a fixed-length string in a large number of texts. (Mode search)
We can see from IDEA 1 that to determine whether there is a needle in haystack, we must completely compare all the characters of needle. Can I use the result of the previous comparison to add only the time of O (1.
The basic idea is:Use a hash code to represent a stringTo ensure the uniqueness of hash, we use a prime number that is larger than the character set, and use the power of this prime number as the base.
For example, for a lowercase letter set, select prime 29 as the base, for example, the hash code of string "abcd" is
Hash = 1 Gbit/s 290 + 2 Gbit/s 291 + 3 Gbit/s 292 + 4 Gbit/s 293 ,
The hash code of the string "bcde" calculated in the next step is Hash = hash/29 + 5 limit 293 This computing process is O (1) Constant operation, the time complexity required to detect all substrings is O (m + (n-m) = O (n) Is a linear algorithm (Rolling hash)
<Note> in this example, the hash code is calculated in positive order. The following program uses the hash code calculated in reverse order, that is
Hash (abcd) = 4 Gbit/s 290 + 3 Gbit/s 291 + 2 Gbit/s 292 + 1 Gbit/s 293 , Similar to hexadecimal conversion
Hash (bcde) = (hash (abcd) −1 limit 293) limit 29 + 5
Int charToInt (char c) {return (int) (c-'A' + 1);} // time complexity O (m + (n-m )) = O (n) int strStr (string haystack, string needle) {int m = needle. size (); int n = haystack. size (); if (m = 0) return 0; if (m> n) return-1; const int base = 29; long max_base = 1; long needle_code = 0; long haystack_code = 0; for (int j = m-1; j> = 0; j --) {needle_code + = charToInt (needle [j]) * max_base; haystack_code + = charToInt (haystack [j]) * max_base; max_base * = base;} max_base/= base; // The maximum base if (haystack_code = needle_code) of the substring return 0; for (int I = m; I <n; I ++) {haystack_code = (haystack_code-charToInt (haystack [I-m]) * max_base) * base + charToInt (haystack [I]); if (haystack_code = needle_code) return I-m + 1;} return-1 ;}
The disadvantage is that the power of the prime number may be very large, so the calculation result must use the long type, or even a larger big int. In addition, the remainder can be reduced, however, there is a small probability of misjudgment.