It is really a complex function for cutting words, enough to write several papers, but if you just want to cut a sentence, a paragraph, an article, you can take advantage of the Chinese natural Language Open source organization of the tools that you and Daniel have written. has been packaged into a jar package and can be called directly without having to consider complex algorithms yourself.
Of course, this word is for natural language, for some regular strings, please take advantage of indexof, substring, split of the various Java self-function, there is no need to use additional Java package.
First, if there is an excerpt from Liang Qichao's "most bitter and most happy" paragraph:
If life can always be like a two or three-year-old child, there is no responsibility, it would have no suffering. How can you hide when the responsibility is on your shoulder when you grow up? But there is a difference in size. It is a great pleasure to do a great job, and a small pleasure to do a small duty. If you want to hide, it is self-investment, can never be lifted.
The words to be divided into the following form:
Divide into Chinese and identify the parts of speech for each word, as follows:
1, first to http://maven.ansj.org/org/ansj/ansj_seg/download the latest version of Ansj_seg.jar, recommended to use more than 2.0x version, and then to http://maven.ansj.org/org/nlpcn/ nlp-lang/download its auxiliary package Nlp-lang.jar, if you are using a 1.0x version of Ansj_seg.jar, go to http://maven.ansj.org/org/ansj/tree_split/ Download the 1.0x auxiliary package Tree_split.jar. After downloading, create a new Lib folder under Eclipse New Java project folder, and throw Ansj_seg-2.0.8.jar and Nlp-lang-1.0.jar over, refresh in Eclipse, right-click the project, select Properties, Java Build Path adds two jars to the project.
2, after, to https://github.com/NLPchina/ansj_seg download its word-cut dictionary,
After downloading, unzip the library folder into the root directory of your Java project, and then throw the library.properties into the Java Project Bin directory.
3, after, create a new Wordsegment.java write the following code, compile and run, then get the result:
Import Java.util.list;import Org.ansj.domain.term;import Org.ansj.splitword.analysis.toanalysis;public class wordsegmenttest {public static void main (string[] args) {String str = "If life can always be like a two or three-year-old child, there would have been no responsibility, it would have been no hardship." How can you hide when the responsibility is on your shoulder when you grow up? But there is a difference in size. It is a great pleasure to do a great job, and a small pleasure to do a small duty. If you want to hide, it is self-investment, can never be lifted. "; list<term> term = toanalysis.parse (str), for (int i = 0; i < term.size (); i++) {String words = Term.get (i). GetName ();//Gets the word string nominal = Term.get (i). GETNATURESTR ();//Get the part of speech System.out.print (words + "\ T" + nominal + "\ n");}}
Where Str is to be cut the Chinese natural language paragraph, the result of the cut is a list of the term object, the list is traversed, you can use GetName () to obtain the cut word, using getnaturestr () to get to the part of speech. The list of parts of speech is as follows: Click to open link
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
"Java" uses the ANSJ Chinese word breaker tool to cut a paragraph