CAS nlpir Chinese participle Java edition

Source: Internet
Author: User
Tags cas

Original: http://blog.csdn.net/k21325/article/details/53052855

Abstract: In order to solve the problem of Chinese search, the first use of the PHP version of open source SCWS, but the processing of people and place names, there will be truncation of the name of the person names errors. Start using nlpir participle, the effect on the accuracy of the word segmentation is better than SCWS. This article describes how to compile Java under Windows systems to generate a jar file that can be executed.

Nlpir's:

Http://ictclas.nlpir.org/downloads

GitHub's address:

Https://github.com/NLPIR-team/NLPIR

Two versions are different, this article explains how to use Eclipse to build your project.

First, NLPIR official version

After downloading the folder in the Bin directory, as shown in, where Nlpir_windemo.exe is a NLPIR demo program, you can try to run and understand the functionality of Nlpir.

Project source code in the sample directory, contains C, C + +, Hadoop, JAVA, Python and other language examples.

Create a new project with Eclipse to import the Java Engineering Catalog Jnatest_nlpir,

(1) Eclipse-File->import

(2) Select the path where the Jnatest_nlpir is located, click Finish

(3) View Eclipse Engineering

(4) The Nlpirtest.java file contains the main function, and the following statement initializes the Nlpir required library file

The Clibrery class is included in the Nlpirtest.java file,

[Java]View PlainCopy
    1. Clibrary Instance = (clibrary) native.loadlibrary ("H:\\workspace\\ictclas\\1\\ictclas2015\\lib\\win64\\nlpir", Clibrary.  class);

The function LoadLibrary needs to pass the library file location, the source code provides the multi-language class library, our project needs to load the Win64 class library, this folder content is as follows,

(5) Load Word breaker Data folder path

[Java]View PlainCopy
    1. String Argu = "h:\\workspace\\ictclas\\1\\ictclas2015";
    2. String System_charset = "UTF-8";
    3. int charset_type = 1;
    4. int init_flag = CLibrary.Instance.NLPIR_Init (Argu, Charset_type, "0");

H:\\WORKSPACE\\ICTCLAS\\1\\ICTCLAS2015 is the parent folder of the Data folder.

Once this step is complete, you can debug the code. The API can read the manual.

Second, github download the code

The directory contains the Nlpir SDK directory, and each directory is a component provided by Nlpir. The Nlpir-ictclas directory contains the code for the Nlpir component.

Import Ictclas_java projects in Eclipse, engineering catalogs such as

The main function is not filled in the project, you can add the main function in the Nlpirtest.java file.

[Java]View PlainCopy
  1. Public class Nlpirtest {
  2. public static void Main (string[] args) throws exception{
  3. Nlpirtest t = new nlpirtest ();
  4. T.testparticiple ();
  5. }
  6. public void Testparticiple () throws IOException {
  7. .....
  8. }
  9. .......
  10. }

Unlike the official website, the loading library file can automatically determine the system type and find the library file in the project's current directory. The Win32, Win64, linux32, linux64 of the project current directory are the folders that contain the library files.

At the same time will automatically load "project current directory" under the data Q Allison for the word breaker directory. These directories are set up so that you can do the debugging work.

Third, the "Find keyword" component in GitHub Key_extract

The project catalog is as follows,

A Java version of sample code is available in project to import the project using Eclipse

Also add the main function in the Keyextractor.java file. The first parameter of Keyextract_getkeywords is the text that needs to be extracted from the keyword, and the second argument is the number of keywords.

[Java]View PlainCopy
    1. Public static void Main (string[] args) {
    2. String keywordsstr = CLibraryKeyExtractor.instance.KeyExtract_GetKeyWords (args[0], Max , true);
    3. System.out.println (KEYWORDSSTR);
    4. CLibraryKeyExtractor.instance.KeyExtract_Exit ();
    5. }


In the current folder of the project, there is a data directory, is the word segmentation and Extraction keywords need to use the word breaker. needs to be license into this folder. You can not distinguish between the use of the user file, it is recommended to test all the files in the current project directory Data folder.

These settings are complete, the parameters are passed in Eclipse, and the menu item Run-->run Configure.

Iv. Exporting Jars

On the Eclipse Project catalog, right-click Export

Select Runnablejar to generate the jar file

You can then use the cmd execution, passing the parameters, the effect is as follows

CAS nlpir Chinese participle Java edition

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.