Chinese word segmentation algorithm-Baidu interview questions, Chinese Word Segmentation Algorithm questions
Question:
Given a string and an array, you can determine whether the string can be separated into words in the dictionary.
Dynamic Planning Algorithm
I wrote the following code during the interview.
public static boolean divied2(String s,String[] dict){ boolean result=false; if(s.length()==0) return true; for (int i = 0; i < dict.length; i++) { int index=s.indexOf(dict[i]); if (index!=-1) { System.out.println(index); String tmp1=s.substring(0,index); String tmp2=s.substring(index+dict[i].length(),s.length()); return divied(tmp1+tmp2,dict); } } return result; }
However, for test cases
String [] dict = {"", ""}; System. out. println (divied2 ("I will know about Baidu", dict ));
This fails. The word Baidu was deleted first, and the word was damaged,
Come back and think about it. The above reason is that the traversal is terminated. After improvement, the test passes
The original question is this | = operation, that is, to perform or operate on all the results, one can be separated completely.
public static boolean divied(String s,String[] dict){ boolean result=false; if(s.length()==0) return true; for (int i = 0; i < dict.length; i++) { int index=s.indexOf(dict[i]); if (index!=-1) { System.out.println(index); String tmp1=s.substring(0,index); String tmp2=s.substring(index+dict[i].length(),s.length()); result|=divied(tmp1+tmp2,dict); } } return result; }
The disadvantage is that the time complexity is too high,
The string length is m, and the dictionary size is n.
The time complexity is:
About n ^ (m)
Public static boolean divied (String s, String [] dict) {boolean result = false; if (s. length () = 0) return true; for (int I = 0; I <dict. length; I ++) {count ++; int index = s. indexOf (dict [I]); if (index! =-1) {System. out. println (index); String tmp1 = s. substring (0, index); String tmp2 = s. substring (index + dict [I]. length (), s. length (); result | = divied (tmp1 + tmp2, dict); if (result) {// return true;} return result ;}
Optimization ideas. Terminate the loop directly when result = true.
Add a global variable to view the number of function executions
Without interruption
The function has been executed for about 180 times (related to the song order ).
After an interruption is added.
The function is executed only 21 times.
When the dictionary order is constantly adjusted, if the strings can be completely separated, the function can be executed for up to 30 times. However, if not, 374 is executed.
Optimization Method 2:
If each word appears only once in a string, you can delete the word in the dictionary after finding and deleting the word. This avoids unnecessary loops.
Optimization Method 3:
In fact, or operations are designed for words with different lengths starting with the same character in a dictionary. This can be more targeted in the program.
The improvements are as follows. No matter whether the strings can be completely separated, the time complexity is basically the same in both cases.
For the following test cases: they can be separated and executed 44 times. If not, 60 times are performed.
The time complexity is reduced to the factorial of the dictionary length.
String [] dict = {"", ""}; System. out. println (divied ("I will know about Baidu", dict ));
String [] dict = {"", ""}; System. out. println (divied ("I will know about Baidu", dict ));
public static boolean divied(String s,String[] dict){ boolean result=false; if(s.length()==0) return true; char start='\0'; for (int i = 0; i < dict.length; i++) { count++; int index=s.indexOf(dict[i]); if (start=='\0'&&index!=-1||index!=-1&&dict[i].charAt(0)==start) { System.out.println(index); String tmp1=s.substring(0,index); String tmp2=s.substring(index+dict[i].length(),s.length()); start=dict[i].charAt(0); result|=divied(tmp1+tmp2,dict); if (result) { return true; } } } return result; }
Optimization idea 4:
For the improvement of Idea 3, recursion is performed only when there are words starting with a repetition. The other operations are deleted and the loop is continued without recursion.
Public class Divide {static int count = 0; public static boolean divied (String s, String [] dict) {boolean result = false; if (s. length () = 0) return true; char start = '\ 0'; for (int I = 0; I <dict. length; I ++) {count ++; int index = s. indexOf (dict [I]); if (start = '\ 0' & index! =-1) {String tmp1 = s. substring (0, index); String tmp2 = s. substring (index + dict [I]. length (), s. length (); s = tmp1 + tmp2; start = dict [I]. charAt (0);} if (index! =-1 & dict [I]. charAt (0) = start) {String tmp1 = s. substring (0, index); String tmp2 = s. substring (index + dict [I]. length (), s. length (); s = tmp1 + tmp2; result | = divied (tmp1 + tmp2, dict); if (result) {return true ;}} return result ;} public static void main (String [] args) {String [] dict = {"Baidu 1", "Baidu", "me", "", "Zhi ", "path"}; System. out. println (divied ("I will know about Baidu", dict); System. out. println (count );}}
The final result is 6 cycles for fully separated items.
If they cannot be completely separated, the number of cycles is 18.