. An example of a maximum inverse matching Word Segmentation Algorithm

Source: Internet
Author: User

Original article URL:
Http://www.52nlp.cn/maximum-matching-method-of-chinese-word-segmentation/

The idea of the reverse matching method is the same as that of the forward method. It is just split from the right to the left. Here is an example:
Example: S1 = "the computational linguistics course is interesting ";
Definition: maxlen = 5; S2 = ""; separator = "/";
Suppose there is a word table :..., Computational linguistics, course, meaning ,...;
   Maximum inverse matching Word Segmentation AlgorithmThe process is as follows:
(1) S2 = ""; S1 is not empty. The candidate substring W = "interesting course" is retrieved from the right side of S1 ";
(2) query the Word Table. W is not in the Word Table. Remove the leftmost word of W to get W = "Cheng quanyi ";
(3) query the Word Table. W is not in the Word Table. Remove the leftmost word of W to get W = "interesting ";
(4) query the Word Table. W is not in the Word Table. Remove the leftmost word of W to get the meaning of W ="
(5) query the word table. In the word table, add W to S2, S2 = "meaning/", and remove W From S1, s1 = "computing linguistics courses available ";
(6) S1 is not empty, so the candidate substring W = "" is extracted from the left side of S1 ";
(7) Check the Word Table. W is not in the Word Table. Remove the leftmost word from the word table. W = "course available" is obtained ";
(8) Check the word table. If W is not in the Word Table, remove the leftmost word from the word table. Then, W = "course available" is obtained ";
(9) query the word table. If W is not in the Word Table, remove the leftmost word from the word table and get W = "Cheng you ";
(10) query the Word Table. W is not in the Word Table. Remove the leftmost word of W and get W = "yes". W is a single word. Add W to S2, s2 = "/meaning", and remove W From S1. At this time, S1 = "computational linguistics course ";
(11) S1 is not empty, so the candidate substring W = "linguistic course" is extracted from the left side of S1 ";
(12) Check the Word Table. W is not in the Word Table. Remove the leftmost word from the word table. W = "Yan Xue course" is obtained ";
(13) Check the Word Table. W is not in the Word Table. Remove the leftmost word from the word table and get W = "course ";
(14) query the word table. If W is not in the Word Table, remove the leftmost word of W to get W = "course ";
(15) query the word table. In the word table, add W to S2, S2 = "Course/have/meaning/", and remove W From S1, s1 = "computational linguistics ";
(16) S1 is not empty, so the candidate substring W = "computational linguistics" is extracted from the left side of S1 ";
(17) query the word table. In the word table, add W to S2. S2 = "computational linguistics/Course/meaning /", remove W From S1, then S1 = "";
(18) If S1 is null, S2 is output as the word splitting result, and the word splitting process ends.

. An example of a maximum inverse matching Word Segmentation Algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.