1---------Java call Nlpir (ICTCLAS2016) for word breaker

Last Update:2016-04-18 Source: Internet

Author: User

Tags netbeans

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Remark: Win7 64-bit system, NetBeans programming

Nlpir participle System, formerly known as the 2000 ictclas,2009 year. Created by Dr. Zhang Huaping.

implementation Steps :

1, in NetBeans, file → new project →java→java application; project name: CWORDSEG;
2. Copy the code from the Nlpirtest.java \sample\jnatest_nlpir\src\code in the Nlpir file into the Cwordseg.java;

The initial modification code is shown:

(1) Revise the package declaration to cwordseg;
(2) refactoring the class name nlpirtest to Cwordseg;
Methods: Right-click cwordseg.java→ reconstruction → Rename, renamed to cwordseg→ Reconstruction;

If you simply modify the class name in the code, you still need to refactor, or the runtime will error: The main class cwordseg cannot be found.
(3) Import Utils. Systemparas; Not used, temporarily commented out.

3. Copy the Utils folder under the Nlpir file ... \sample\jnatest_nlpir\src directly to the SRC folder of the project cwordseg;

4, the Nlpir file under the \sample\jnatest_nlpir\lib under the Jna-4.0.0.jar import into the project library;
Method (1): Right-click library → add jar→ select Jna-4.0.0.jar Import;
Method (2): Copy the Jna-4.0.0.jar file directly into the project ... \cwordseg\lib folder.
After the guide, the project catalogue is as follows:

5. Create a new folder in the Project Cwordseg folder file:
(1) Copy all the data folders in the Nlpir to the file folder;
(2) The \lib\Win64 folder is also copied to the file folder ( Note : If it is Win32 or Linux, select the corresponding folder).

6. Modify part of Code 2:
(1) Modify the path where the file NLPIR.dll is located, and it is copied into the Win64 folder in the 5th step, for example:
D:\\netbeansprojects\\cwordseg\\file\win64\\nlpir
Note : The last Nlpir is the file name, do not add the suffix . dll.
Attached: has been tested, if the 64-bit operating system, the use of 32-bit files will be error.

(2) The path where the data folder is repaired (that is, the Data folder in step 5th), as shown in:
D:\\netbeansprojects\\cwordseg\\file

(3) Other places that can be changed:
Encoding format: int charset_type = 1; Change to a different value.
Among them: GBK corresponding to 0,utf-8 corresponding 1,big5 corresponding to 2, containing traditional Chinese GBK corresponding to 3.

The following code is simplified :

1  Packagecwordseg; 2   3 Importjava.io.UnsupportedEncodingException; 4 //Import Utils.  Systemparas; 5 Importcom.sun.jna.Library; 6 Importcom.sun.jna.Native; 7   8 /** 9  *  Ten * Function: Basic word-breaker function One * Last updated: March 14, 2016 21:01:21 A  */   -    -  Public classcwordseg { the     //define interface Clibrary, inherit from Com.sun.jna.Library -      Public InterfaceClibraryextendsLibrary { -         //defines and initializes a static variable for the interface, which is used to load NLPIR.dll, the path points to the file NLPIR.dll, but without the suffix DLL -Clibrary Instance = (clibrary) native.loadlibrary ("D:\\netbeansprojects\\cwordseg\\file\\win64\\nlpir", CLibrary.class);  +         //initialization function declaration: Sdatapath is the initialization path address, including the core thesaurus and the path to the configuration file, encoding the encoded format of the input character -          Public intNlpir_init (String Sdatapath,intencoding,string Slicencecode);  +         //word breaker function declaration: SSRC is a string to be divided, bpostagged=0 means not to do part-of-speech labeling, bpostagged=1 means to do pos tagging A          PublicString Nlpir_paragraphprocess (String sSrc,intbpostagged);  at         //gets the function declaration of the last error message -          PublicString nlpir_getlasterrormsg ();  -         //Exit Function Declaration -          Public voidNlpir_exit ();  -     }   -        in      Public Staticstring transstring (String aidstring,string ori_encoding,string new_encoding) { -         Try {   to             return NewString (Aidstring.getbytes (ori_encoding), new_encoding);  +}Catch(unsupportedencodingexception e) { - E.printstacktrace ();  the         }   *         return NULL;  $     }  Panax Notoginseng        -      Public Static voidMain (string[] args)throwsException { theString Argu = "D:\\netbeansprojects\\cwordseg\\file";//This path points to the Data folder (System kernel thesaurus) +         //String system_charset = "UTF-8";  A         intCharset_type = 1;//UTF-8 encoding mode, the other GBK corresponding to 0,big5 corresponding to 2, including the traditional Chinese GBK corresponding 3 the         intInit_flag = CLibrary.Instance.NLPIR_Init (Argu, Charset_type, "0");//Run initialization function, Success returns 1, failure returns 0 + String nativebytes;  -    $         //initialization failure Prompt $         if(0 = =Init_flag) {   -Nativebytes = CLIBRARY.INSTANCE.NLPIR_GETLASTERRORMSG ();//Get error message -SYSTEM.ERR.PRINTLN ("Initialization failed! Reason: "+nativebytes);  the             return;  -         }  Wuyi            theString SInput = "This is a book about information retrieval. ";//Manual input of string sinput -         Try {   WuNativebytes = CLibrary.Instance.NLPIR_ParagraphProcess (sInput, 1);//Run the word breaker function -SYSTEM.OUT.PRINTLN ("Participle result is:" + nativebytes);//Output Word Segmentation results AboutCLibrary.Instance.NLPIR_Exit ();//Exit $}Catch(Exception ex) { -             //TODO auto-generated Catch block - Ex.printstacktrace ();  -         }   A     }   +}

Operation result :

Error resolved : Main class cwordseg not found

The class name is modified in the 2nd step and needs to be modified in the correct way, or by refactoring.

1---------Java call Nlpir (ICTCLAS2016) for word breaker

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More