IK Chinese word extension custom dictionary!!!

Source: Internet
Author: User
Tags split trim
1. Custom segmentation requirements and process design based on distributed system (see figure) E:\plan\readingnote\ participle and index \ Word \2012-4-20
2. The principle of Word segmentation--the loading process of dictionaries 2.1. The loading process of the word breaker involves 3 classes, namely, the configuration class, the directory class, and the Dictsegment class. The first two classes are to get the configuration file, get the word breaker dictionary, for the loading of the dictionary content to prepare.  The Dictsegment class is the class that implements the real word breaker loading.   2.2. In the process of invoking the word breaker, the Directory class object is called first, and in Method Loadmaindict (), the contents of the custom word breaker dictionary are loaded. 2.2.1. In the load of custom word breakers, first call a method in the configuration class to get the configured path to the custom dictionary file configured in the IKAnalyzer.cfg.xml (custom dictionary file configuration path). list<string> extdictfiles = Configuration.getextdictionarys (); Before this, you have to get the path to the configuration file and load it into memory as a stream.   In fact, both of these things are implemented in the configuration class, and the directory class simply invokes the interface provided by the configuration class. 2.2.2. Now let's look at two things that are done in the configuration class. Private Configuration () {props = new Properties ();//String Path=configuration.class.getresource (file_name). ToString ( ); String Path2=configuration.class.getresource (""). ToString (); String Path3=configuration.class.getresource ("/"). ToString (); InputStream input = Configuration.class.getResourceAsStream (file_name); if (input! = null) {try {props.loadfromxml (input)} catch (Invalidpropertiesformatexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();}} }   (1) initialization. Loads the IKAnalyzer.cfg.xml as a stream into memory. Note The bold code, which makes it possible to load only files under the Classpath path where the class resides without modifying the code. (for the understanding of bold code, see E:\plan\readingnote\ participle and index \ participle \ Word breaker custom dictionary path)   public static list<string> Getextdictionarys () {list<string> extdictfiles = new arraylist<string> (2); String extdictcfg = CFG.props.getProperty (ext_dict); if (extdictcfg! = null) {//use; split multiple extension dictionary configurations string[] filepaths = Extdictcfg.split (";"); if (filepaths! = null) {for (String file Path:filepaths) {if (FilePath! = null &&! "). Equals (Filepath.trim ())) {Extdictfiles.add (Filepath.trim ());//system.out.println (Filepath.trim ());}} }} return extdictfiles; } (2) This code is used to obtain a custom dictionary configuration path from IKAnalyzer.cfg.xml and put it in a collection, returned as a return value. 2.2.3 now goes back to the directory class. After the custom dictionary file path is obtained, the custom dictionary is found based on the file path, and then the call Dircsegment is loaded into the NEI. list<string> extdictfiles  = ConfiguratioN.getextdictionarys (); if (extdictfiles! = null) {for (String extdictname:extdictfiles) {//Read extended dictionary file is = Dictionary.class.getResourceAsStream (extdictname);If the extended dictionary is not found, the if (is = = null) {continue;} try {bufferedreader br = new BufferedReader (new InputStreamReader (IS, "utf- 8 "), 512); String Theword = null; do {Theword = Br.readline (); if (Theword! = null &&! "). Equals (Theword.trim ())) {//load extended dictionary data into main memory dictionary//system.out.println (Theword); _maindict.fillsegment (Theword.trim (). ToCharArray ());}} while (Theword! = null); } catch (IOException IoE) {System.err.println ("Extension Dictionary loading exception."); Ioe.printstacktrace ();}                    finally{try {if (is = = null) {is.close (); is = null; }} catch (IOException e) {e.printstacktrace ();}} }} takes note of the bold section, which requires that it only obtains the definition dictionary file under the classpath path where the directory class resides, which is beyond the scope of this path. The second bold part is called Dictsegment to load the words in the custom dictionary.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.