Algorithm Description:
- S1 is a striped string, S2 is empty, maxlen is the maximum word length in the dictionary
- Determines whether the S1 is empty, if the output S2
- From S1 to the right, remove the pending string str (where Str is less than maxlen)
- See if STR is in the dictionary, if it goes to 5, if otherwise 6
- s2+=str+ "/", S1-=STR, ext. 2
- Remove the leftmost word from str
- Determine if STR is a word, or if it turns 5, if it turns 4
Java implementation code:
1 Public StaticList<string>BMM (String text) {2stack<string> result =NewStack<string>();3 while(Text.length () > 0) {4 intLen =max_length;5 if(Text.length () <Len) {6Len =text.length ();7 }8String Tryword = text.substring (Text.length ()-len);9 while(!dic.contains (Tryword)) {Ten if(tryword.length () = = 1) { One Break; A } -Tryword = tryword.substring (1); - } the Result.push (Tryword); -Text = text.substring (0, Text.length ()-tryword.length ()); - } - intLen =result.size (); +list<string> list =NewArraylist<string>(len); - for(inti = 0; i < Len; i++) { + List.add (Result.pop ()); A } at returnlist; -}
Summary:
The inverse of the biggest match is also one of the basic algorithm of Chinese word segmentation, because it is mechanical segmentation, so it also has a bit faster, the maximum reverse matching than the positive maximum matching more in line with people's language habits, in the previous article mentioned "Changchun Mayor Happy Spring Festival" This sentence it can be perfect, but it still has limitations such as in the segmentation Changchun pharmacy "," Painting on the Lotus Monk painting "This type of sentence when its performance is no positive maximum matching good, so on this basis we can use two-way maximum matching to solve such problems.
A dictionary-based inverse maximum matching algorithm for Chinese word segmentation