Chinese word segmentation algorithm-Baidu interview questions, Chinese Word Segmentation Algorithm questions

Source: Internet
Author: User

Chinese word segmentation algorithm-Baidu interview questions, Chinese Word Segmentation Algorithm questions

Question:
Given a string and an array, you can determine whether the string can be separated into words in the dictionary.

Dynamic Planning Algorithm
I wrote the following code during the interview.

public static boolean divied2(String s,String[] dict){        boolean result=false;        if(s.length()==0)             return true;        for (int i = 0; i < dict.length; i++) {            int index=s.indexOf(dict[i]);            if (index!=-1) {                System.out.println(index);                String tmp1=s.substring(0,index);                String tmp2=s.substring(index+dict[i].length(),s.length());                return divied(tmp1+tmp2,dict);            }        }        return result;    }

However, for test cases

String [] dict = {"", ""}; System. out. println (divied2 ("I will know about Baidu", dict ));

This fails. The word Baidu was deleted first, and the word was damaged,

Come back and think about it. The above reason is that the traversal is terminated. After improvement, the test passes
The original question is this | = operation, that is, to perform or operate on all the results, one can be separated completely.

    public static boolean divied(String s,String[] dict){        boolean result=false;        if(s.length()==0)             return true;        for (int i = 0; i < dict.length; i++) {            int index=s.indexOf(dict[i]);            if (index!=-1) {                System.out.println(index);                String tmp1=s.substring(0,index);                String tmp2=s.substring(index+dict[i].length(),s.length());                result|=divied(tmp1+tmp2,dict);            }        }        return result;    }

The disadvantage is that the time complexity is too high,
The string length is m, and the dictionary size is n.
The time complexity is:
About n ^ (m)

Public static boolean divied (String s, String [] dict) {boolean result = false; if (s. length () = 0) return true; for (int I = 0; I <dict. length; I ++) {count ++; int index = s. indexOf (dict [I]); if (index! =-1) {System. out. println (index); String tmp1 = s. substring (0, index); String tmp2 = s. substring (index + dict [I]. length (), s. length (); result | = divied (tmp1 + tmp2, dict); if (result) {// return true;} return result ;}

Optimization ideas. Terminate the loop directly when result = true.
Add a global variable to view the number of function executions
Without interruption
The function has been executed for about 180 times (related to the song order ).
After an interruption is added.
The function is executed only 21 times.

When the dictionary order is constantly adjusted, if the strings can be completely separated, the function can be executed for up to 30 times. However, if not, 374 is executed.

Optimization Method 2:
If each word appears only once in a string, you can delete the word in the dictionary after finding and deleting the word. This avoids unnecessary loops.

Optimization Method 3:
In fact, or operations are designed for words with different lengths starting with the same character in a dictionary. This can be more targeted in the program.

The improvements are as follows. No matter whether the strings can be completely separated, the time complexity is basically the same in both cases.
For the following test cases: they can be separated and executed 44 times. If not, 60 times are performed.
The time complexity is reduced to the factorial of the dictionary length.

String [] dict = {"", ""}; System. out. println (divied ("I will know about Baidu", dict ));
String [] dict = {"", ""}; System. out. println (divied ("I will know about Baidu", dict ));
public static boolean divied(String s,String[] dict){        boolean result=false;        if(s.length()==0)             return true;        char start='\0';        for (int i = 0; i < dict.length; i++) {            count++;            int index=s.indexOf(dict[i]);            if (start=='\0'&&index!=-1||index!=-1&&dict[i].charAt(0)==start) {                System.out.println(index);                String tmp1=s.substring(0,index);                String tmp2=s.substring(index+dict[i].length(),s.length());                start=dict[i].charAt(0);                result|=divied(tmp1+tmp2,dict);                 if (result) {                    return true;                }            }        }        return result;    }

Optimization idea 4:
For the improvement of Idea 3, recursion is performed only when there are words starting with a repetition. The other operations are deleted and the loop is continued without recursion.

Public class Divide {static int count = 0; public static boolean divied (String s, String [] dict) {boolean result = false; if (s. length () = 0) return true; char start = '\ 0'; for (int I = 0; I <dict. length; I ++) {count ++; int index = s. indexOf (dict [I]); if (start = '\ 0' & index! =-1) {String tmp1 = s. substring (0, index); String tmp2 = s. substring (index + dict [I]. length (), s. length (); s = tmp1 + tmp2; start = dict [I]. charAt (0);} if (index! =-1 & dict [I]. charAt (0) = start) {String tmp1 = s. substring (0, index); String tmp2 = s. substring (index + dict [I]. length (), s. length (); s = tmp1 + tmp2; result | = divied (tmp1 + tmp2, dict); if (result) {return true ;}} return result ;} public static void main (String [] args) {String [] dict = {"Baidu 1", "Baidu", "me", "", "Zhi ", "path"}; System. out. println (divied ("I will know about Baidu", dict); System. out. println (count );}}

The final result is 6 cycles for fully separated items.
If they cannot be completely separated, the number of cycles is 18.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.