Original article URL:
Http://www.52nlp.cn/maximum-matching-method-of-chinese-word-segmentation/
The idea of the reverse matching method is the same as that of the forward method. It is just split from the right to the left. Here is an example:
Example: S1 = "the computational linguistics course is interesting ";
Definition: maxlen = 5; S2 = ""; separator = "/";
Suppose there is a word table :..., Computational linguistics, course, meaning ,...;
Maximum inverse matching Word Segmentation AlgorithmThe process is as follows:
(1) S2 = ""; S1 is not empty. The candidate substring W = "interesting course" is retrieved from the right side of S1 ";
(2) query the Word Table. W is not in the Word Table. Remove the leftmost word of W to get W = "Cheng quanyi ";
(3) query the Word Table. W is not in the Word Table. Remove the leftmost word of W to get W = "interesting ";
(4) query the Word Table. W is not in the Word Table. Remove the leftmost word of W to get the meaning of W ="
(5) query the word table. In the word table, add W to S2, S2 = "meaning/", and remove W From S1, s1 = "computing linguistics courses available ";
(6) S1 is not empty, so the candidate substring W = "" is extracted from the left side of S1 ";
(7) Check the Word Table. W is not in the Word Table. Remove the leftmost word from the word table. W = "course available" is obtained ";
(8) Check the word table. If W is not in the Word Table, remove the leftmost word from the word table. Then, W = "course available" is obtained ";
(9) query the word table. If W is not in the Word Table, remove the leftmost word from the word table and get W = "Cheng you ";
(10) query the Word Table. W is not in the Word Table. Remove the leftmost word of W and get W = "yes". W is a single word. Add W to S2, s2 = "/meaning", and remove W From S1. At this time, S1 = "computational linguistics course ";
(11) S1 is not empty, so the candidate substring W = "linguistic course" is extracted from the left side of S1 ";
(12) Check the Word Table. W is not in the Word Table. Remove the leftmost word from the word table. W = "Yan Xue course" is obtained ";
(13) Check the Word Table. W is not in the Word Table. Remove the leftmost word from the word table and get W = "course ";
(14) query the word table. If W is not in the Word Table, remove the leftmost word of W to get W = "course ";
(15) query the word table. In the word table, add W to S2, S2 = "Course/have/meaning/", and remove W From S1, s1 = "computational linguistics ";
(16) S1 is not empty, so the candidate substring W = "computational linguistics" is extracted from the left side of S1 ";
(17) query the word table. In the word table, add W to S2. S2 = "computational linguistics/Course/meaning /", remove W From S1, then S1 = "";
(18) If S1 is null, S2 is output as the word splitting result, and the word splitting process ends.
. An example of a maximum inverse matching Word Segmentation Algorithm