Remark: Win7 64-bit system, NetBeans programming
Nlpir participle System, formerly known as the 2000 ictclas,2009 year. Created by Dr. Zhang Huaping.
implementation Steps :
1, in NetBeans, file → new project →java→java application; project name: CWORDSEG;
2. Copy the code from the Nlpirtest.java \sample\jnatest_nlpir\src\code in the Nlpir file into the Cwordseg.java;
The initial modification code is shown:
(1) Revise the package declaration to cwordseg;
(2) refactoring the class name nlpirtest to Cwordseg;
Methods: Right-click cwordseg.java→ reconstruction → Rename, renamed to cwordseg→ Reconstruction;
If you simply modify the class name in the code, you still need to refactor, or the runtime will error: The main class cwordseg cannot be found.
(3) Import Utils. Systemparas; Not used, temporarily commented out.
3. Copy the Utils folder under the Nlpir file ... \sample\jnatest_nlpir\src directly to the SRC folder of the project cwordseg;
4, the Nlpir file under the \sample\jnatest_nlpir\lib under the Jna-4.0.0.jar import into the project library;
Method (1): Right-click library → add jar→ select Jna-4.0.0.jar Import;
Method (2): Copy the Jna-4.0.0.jar file directly into the project ... \cwordseg\lib folder.
After the guide, the project catalogue is as follows:
5. Create a new folder in the Project Cwordseg folder file:
(1) Copy all the data folders in the Nlpir to the file folder;
(2) The \lib\Win64 folder is also copied to the file folder ( Note : If it is Win32 or Linux, select the corresponding folder).
6. Modify part of Code 2:
(1) Modify the path where the file NLPIR.dll is located, and it is copied into the Win64 folder in the 5th step, for example:
D:\\netbeansprojects\\cwordseg\\file\win64\\nlpir
Note : The last Nlpir is the file name, do not add the suffix . dll.
Attached: has been tested, if the 64-bit operating system, the use of 32-bit files will be error.
(2) The path where the data folder is repaired (that is, the Data folder in step 5th), as shown in:
D:\\netbeansprojects\\cwordseg\\file
(3) Other places that can be changed:
Encoding format: int charset_type = 1; Change to a different value.
Among them: GBK corresponding to 0,utf-8 corresponding 1,big5 corresponding to 2, containing traditional Chinese GBK corresponding to 3.
The following code is simplified :
1 Packagecwordseg; 2 3 Importjava.io.UnsupportedEncodingException; 4 //Import Utils. Systemparas; 5 Importcom.sun.jna.Library; 6 Importcom.sun.jna.Native; 7 8 /** 9 * Ten * Function: Basic word-breaker function One * Last updated: March 14, 2016 21:01:21 A */ - - Public classcwordseg { the //define interface Clibrary, inherit from Com.sun.jna.Library - Public InterfaceClibraryextendsLibrary { - //defines and initializes a static variable for the interface, which is used to load NLPIR.dll, the path points to the file NLPIR.dll, but without the suffix DLL -Clibrary Instance = (clibrary) native.loadlibrary ("D:\\netbeansprojects\\cwordseg\\file\\win64\\nlpir", CLibrary.class); + //initialization function declaration: Sdatapath is the initialization path address, including the core thesaurus and the path to the configuration file, encoding the encoded format of the input character - Public intNlpir_init (String Sdatapath,intencoding,string Slicencecode); + //word breaker function declaration: SSRC is a string to be divided, bpostagged=0 means not to do part-of-speech labeling, bpostagged=1 means to do pos tagging A PublicString Nlpir_paragraphprocess (String sSrc,intbpostagged); at //gets the function declaration of the last error message - PublicString nlpir_getlasterrormsg (); - //Exit Function Declaration - Public voidNlpir_exit (); - } - in Public Staticstring transstring (String aidstring,string ori_encoding,string new_encoding) { - Try { to return NewString (Aidstring.getbytes (ori_encoding), new_encoding); +}Catch(unsupportedencodingexception e) { - E.printstacktrace (); the } * return NULL; $ } Panax Notoginseng - Public Static voidMain (string[] args)throwsException { theString Argu = "D:\\netbeansprojects\\cwordseg\\file";//This path points to the Data folder (System kernel thesaurus) + //String system_charset = "UTF-8"; A intCharset_type = 1;//UTF-8 encoding mode, the other GBK corresponding to 0,big5 corresponding to 2, including the traditional Chinese GBK corresponding 3 the intInit_flag = CLibrary.Instance.NLPIR_Init (Argu, Charset_type, "0");//Run initialization function, Success returns 1, failure returns 0 + String nativebytes; - $ //initialization failure Prompt $ if(0 = =Init_flag) { -Nativebytes = CLIBRARY.INSTANCE.NLPIR_GETLASTERRORMSG ();//Get error message -SYSTEM.ERR.PRINTLN ("Initialization failed! Reason: "+nativebytes); the return; - } Wuyi theString SInput = "This is a book about information retrieval. ";//Manual input of string sinput - Try { WuNativebytes = CLibrary.Instance.NLPIR_ParagraphProcess (sInput, 1);//Run the word breaker function -SYSTEM.OUT.PRINTLN ("Participle result is:" + nativebytes);//Output Word Segmentation results AboutCLibrary.Instance.NLPIR_Exit ();//Exit $}Catch(Exception ex) { - //TODO auto-generated Catch block - Ex.printstacktrace (); - } A } +}
Operation result :
Error resolved : Main class cwordseg not found
The class name is modified in the 2nd step and needs to be modified in the correct way, or by refactoring.
1---------Java call Nlpir (ICTCLAS2016) for word breaker