What is Nlpir?
Nlpir (Chinese Word segmentation system) by the Zhong Ke Zhang Huaping Dr team developed, the main functions include: Chinese word segmentation, POS tagging, named entity recognition, User dictionary functions, details see official website: http://ictclas.nlpir.org/.
Second, the use of the Java environment:
Mainly refer to the following information: http://www.360doc.com/content/14/0926/15/19424404_412519063.shtml
The following is a personal use method, for reference only
1, Download Nlpir Toolkit, link as follows: http://ictclas.nlpir.org/newsdownloads?DocId=389
The toolkit contains the following content: (to be added)
2, Nlpir is in C, C + + environment, so in the Java environment, to download its provided Java interface, so I downloaded the Windows 64-bit JNI compression package (according to their own machine environment download): http://ictclas.nlpir.org/ Newsdownloads? docid=353
So now there are two packages: Nlpir Toolkit, JNI interface package.
3. Now you can start building your own project:
(1) Create a Java project, the final form of the directory such as:
Which: Bai package is the test program written by himself
Kevin.zhang is the contents of the 64-bit JNI compression package, copied to the Java project you created
File: The directory created for yourself, the data file is from the NLPIR toolkit
Test: From NLPIR Toolkit
NLPIR.dll from the Nlpir Toolkit lib directory
Nlpir_jni.dll from JNI interface package
4, write the word breaker program
The code is as follows:
PackageBai;ImportKevin.zhang.NLPIR; Public classNlpir_test { Public Static voidMain (String args[]) {Try{test (); } Catch(Exception e) {e.printstacktrace (); } } Static voidTest ()throwsException {//TODO auto-generated Method Stub//here is ("./file/") without modificationNlpir nlpir=NewNlpir (); if(! Nlpir. Nlpir_init ("./file/". GetBytes ("UTF-8"), 1) {System.out.println ("Nlpir initialization Failed"); return ; } //Sentence Segmentation testString temp= "Daily daily newspaper all remember to send, to match the manager master project Progress Situation"; byte[] Resbytes=nlpir. Nlpir_paragraphprocess (Temp.getbytes ("UTF-8"), 0); System.out.println ("Participle Result:" +NewString (Resbytes, "UTF-8")); //file word breaker testString utf8file = "E:/wbjddata/user_product_similarity/product_vector_pro.txt"; String Utf8fileresult= "E:/wbjddata/user_product_similarity/product_vector_pro_seg_result.txt"; Nlpir. Nlpir_fileprocess (Utf8file.getbytes (), Utf8fileresult.getbytes (),0); //exit, Release resourcesNlpir. Nlpir_exit (); //Nlpir. Nlpir_fileprocess,nlpir. The second parameter in nlpir_paragraphprocess, 0, indicates that only word words are displayed, and the dimensions of part of speech are not displayed }}
Use of the Nlpir participle tool (in the Java environment)