"Java" uses the ANSJ Chinese word breaker tool to cut a paragraph

Source: Internet
Author: User

It is really a complex function for cutting words, enough to write several papers, but if you just want to cut a sentence, a paragraph, an article, you can take advantage of the Chinese natural Language Open source organization of the tools that you and Daniel have written. has been packaged into a jar package and can be called directly without having to consider complex algorithms yourself.

Of course, this word is for natural language, for some regular strings, please take advantage of indexof, substring, split of the various Java self-function, there is no need to use additional Java package.

First, if there is an excerpt from Liang Qichao's "most bitter and most happy" paragraph:

If life can always be like a two or three-year-old child, there is no responsibility, it would have no suffering. How can you hide when the responsibility is on your shoulder when you grow up? But there is a difference in size. It is a great pleasure to do a great job, and a small pleasure to do a small duty. If you want to hide, it is self-investment, can never be lifted.

The words to be divided into the following form:


Divide into Chinese and identify the parts of speech for each word, as follows:

1, first to http://maven.ansj.org/org/ansj/ansj_seg/download the latest version of Ansj_seg.jar, recommended to use more than 2.0x version, and then to http://maven.ansj.org/org/nlpcn/ nlp-lang/download its auxiliary package Nlp-lang.jar, if you are using a 1.0x version of Ansj_seg.jar, go to http://maven.ansj.org/org/ansj/tree_split/ Download the 1.0x auxiliary package Tree_split.jar. After downloading, create a new Lib folder under Eclipse New Java project folder, and throw Ansj_seg-2.0.8.jar and Nlp-lang-1.0.jar over, refresh in Eclipse, right-click the project, select Properties, Java Build Path adds two jars to the project.


2, after, to https://github.com/NLPchina/ansj_seg download its word-cut dictionary,


After downloading, unzip the library folder into the root directory of your Java project, and then throw the library.properties into the Java Project Bin directory.

3, after, create a new Wordsegment.java write the following code, compile and run, then get the result:

Import Java.util.list;import Org.ansj.domain.term;import Org.ansj.splitword.analysis.toanalysis;public class wordsegmenttest {public static void main (string[] args) {String str = "If life can always be like a two or three-year-old child, there would have been no responsibility, it would have been no hardship." How can you hide when the responsibility is on your shoulder when you grow up? But there is a difference in size. It is a great pleasure to do a great job, and a small pleasure to do a small duty. If you want to hide, it is self-investment, can never be lifted. "; list<term> term = toanalysis.parse (str), for (int i = 0; i < term.size (); i++) {String words = Term.get (i). GetName ();//Gets the word string nominal = Term.get (i). GETNATURESTR ();//Get the part of speech System.out.print (words + "\ T" + nominal + "\ n");}}

Where Str is to be cut the Chinese natural language paragraph, the result of the cut is a list of the term object, the list is traversed, you can use GetName () to obtain the cut word, using getnaturestr () to get to the part of speech. The list of parts of speech is as follows: Click to open link


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"Java" uses the ANSJ Chinese word breaker tool to cut a paragraph

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.