ICTCLAS Java is used in 10 minutes.

Source: Internet
Author: User

Http://www.blogjava.net/zhenandaci/archive/2008/12/17/230269.html

ICTCLAS It is a Chinese Word Segmentation produced by computing of the Chinese Emy of sciences.ProgramPackage has a good reputation and high usage in China. C #, Delphi, and Java versions have been available only in the past. The following uses a very small example to enable ICTCLAS in 10 minutes. From then on, we have started to develop our own text classification and search engine.

It should be noted first that, unlike the JNI call provided by the previous C ++ version, this uses the pure Java version ICTCLAS at http://ictclas.org/down_opensrc.asp.

Okay. Suppose you have downloadedJavaVersionIctclas4j, Decompress it now, and thenDataCopy the entire folder EclipseUnder the project folder, WhileBinDirectoryOrgThe entire folder is copied to you.EclipseProjectBinDirectory, copy the entire org folder under the src directoryEclipseUnder the src directory of the project(The simplest and quickest way to use it, or you can achieve it yourselfJarIn this way, you canBuild pathImport thisJarPackage ).

Now you can create a new class in your project. I created a new class,CodeAs follows:

ImportOrg. ictclas4j. Bean. segresult;

ImportOrg. ictclas4j. segment. segtag;

Public class onemain {

Public static void main (string [] ARGs) {

system. out . println ( " this is onemain " );

Segtag ST =NewSegtag (1 );

Segresult sr = ST

. split ( " A piece of diligent and beautiful money ,/ create an economic aircraft carrier. ABCD. # $ % Hello world! \ N another piece of text 123 vehicle ! 3.0 " );

System.Out. Println (Sr. getfinalresult ());

}

}

Obviously, the text is "a piece of diligent and beautiful money,/Create an economic aircraft carrier.ABCD. # $ % Hello world! "NAnother text123Vehicle !3.0"Is the text we use for testing. It contains Chinese characters, English letters, punctuation marks, messy symbols (laughter) and Arabic numerals.

Run the program and check the output:

This is onemain

One piece /S Diligence / Location /U Pretty / Of /U I /M Block /Q Money /N,/W // NX Build /V Economic /N Of /U Aircraft carriers /N . /W abcd. # $ %/NX Hello/NX world/NX! /W Again /D I /M Segment /Q Text /N 123/m Vehicle /Q

The result of Word Segmentation is a long one.StringClass data. Each word is separated by spaces./The subsequent English mark marks the part of speech. Let's take a look at some interesting places.

In the original text, there are actually two "one piece" and one is "one piece of diligence". Here we correctly identify them as adverbs, the "one" in the next "one dollar" is also correctly recognized as a quantizer.

Arabic numerals are correctly recognized as numerals, including decimal form3.0". English and messy symbols (including the invisible line break, where did you find it ?) Is classified as one type --/NX! (Because I don't know either.ICTCLASWhat is it called by internal personnel? illegal characters, invalid characters, or other characters? Can the name be obtained by yourself)

There are two exclamation points in the test text. One is half-width English.!, One is full-width Chinese !, Both are correctly recognized as punctuation marks, but the English periods."Is considered to be/NX.

Spaces in the test text are ignored completely.

Okay. It's very simple, right? Go and have fun.

 

 

Posted on Jasper reading (4198) Comments (27) EDIT favorite categories: Text Classification Technology, Java Technology


There are a lot of Word Segmentation programs like commenting, huh ~~ This is also good. # Re: I started using ICTCLAS Java in 10 minutes. I used the 2.0beta version, that is, there is no part-of-speech tagging.
This is quite good.
Hehe tinypig commented on to reply to more comments # Re: I used ICTCLAS Java in 10 minutes, no !!
Are you sure you have run successfully? Why does this error occur:
Exception in thread "Main" Java. Lang. nullpointerexception
At org. ictclas4j. Bean. dictionary. getmaxmatch (dictionary. Java: 571)
At org. ictclas4j. segment. graphgenerate. Generate (graphgenerate. Java: 93)
At org. ictclas4j. segment. segtag. Split (segtag. Java: 63)
At testjava. testictclas4j. Main (testictclas4j. Java: 12) tinypig commented on 2008-09-22 20:36 to reply to more comments # Re: Use ICTCLAS Java in 10 minutes. You reminded me, Article Is wrong about the location where the data folder is placed. the correct location should be under the folder of your Eclipse project, rather than the bin directory. This should be okay. Jasper commented on to reply to more comments # Re: 10 minutes to use ICTCLAS Java. Thank you.
I may ask for more advice later. tinypig commented on and replied to more comments at # Re: Use ICTCLAS Java in 10 minutes, thank you for your comments from norm at pm. # Re: The ICTCLAS Java version [not logged in] can be run in that onemail class in 10 minutes, but an error occurs when it is put into JSP. Same code
Consult ~ Urgent marine comments reply to more comments at on # Re: 10 minutes to use ICTCLAS Java edition hello, how can I call ictclas4j In the JDK environment?
Where can I use onemain. Java? Hey, Newbie, you can't use it. Please help me. Thank you for yjwmylm's comment at for replying to more comments # Re: 10 minutes to start using ICTCLAS java. Hello, I put ictclas4j for use in JDK 1.6, and I also encountered Java. lang. nullpointerexception
Org. ictclas4j. Bean. dictionary. getmaxmatch (dictionary. Java: 571)
Org. ictclas4j. segment. graphgenerate. Generate (graphgenerate. Java: 93)
Org. ictclas4j. segment. segtag. Split (segtag. Java: 63)
Is your problem solved? Can you tell me why ysf-mylm commented on and replied to more comments # Re: Use ICTCLAS Java version @ ysf-mylm in 10 minutes
Because ICTCLAS has a commercial version that sells money, there are still many problems with this open-source version. For example, some words that do not exist in the dictionary will throw NULL pointer errors, such as words like "Shenzhen" and "Osaka.
There are also some special character string modes, such as single quotes separated by a few characters plus something, the error will be reported (a long time, not clearly remembered ). There are some special characters that may also report errors. If you do not pay much attention to these elements, we suggest you modify them. Source code To block such exceptions. Jasper commented on to reply to more comments # Re: 10 minutes to use ICTCLAS Java @ yjwmylm
Data Location Error .. WWW comments at reply more comments # Re: 10 minutes to use ICTCLAS Java I can use this locally, but errors will occur when creating a web project. Segresult sr = ST. Split (input );
Java. Lang. nullpointerexception
At org. ictclas4j. Bean. dictionary. getmaxmatch (dictionary. Java: 571)
At org. ictclas4j. segment. graphgenerate. Generate (graphgenerate. Java: 93)
At Org. ictclas4j. segment. segtag. split (segtag. java: 63) tttt comments reply to more comments at on # Re: I can use ICTCLAS Java in 10 minutes. Thank you! Hust comments on reply more comments # Re: 10 minutes to use the ICTCLAS Java version of JSP to pass the string is correct, even if you directly use the new string, the same problem still occurs. It is okay to use Java application for the same operation. Tttt comments: Back to more comments # Re: Use ICTCLAS Java version @ tttt in 10 minutes
Please note that resources that can be found by local applications may not be found on the Web server, so please try to put the ICTCLAS dictionary file (that is, the data folder) in the correct location. Jasper comments: reply to more comments at pm on # Re: Use ICTCLAS Java version @ Jasper in 10 minutes
Thank you.
Well, I also found out that this is the reason, and even setting the data file as an environment variable is not acceptable. Finally, put data in the eclipse installation directory. Is there a way to modify the default resource path? Tttt comments: I will reply to more comments at, # Re: How can I solve this problem when I started using ICTCLAS Java in 10 minutes?
Exception in thread "Main" Java. Lang. noclassdeffounderror: bean/filesutil
At org. ictclas4j. segment. segtag. <init> (segtag. Java: 33)
At onemain. Main (onemain. Java: 11)
Caused by: Java. Lang. classnotfoundexception: bean. filesutil
At java.net. urlclassloader $ 1.run( unknown source)
At java. Security. accesscontroller. doprivileged (native method)
At java.net. urlclassloader. findclass (unknown source)
At java. Lang. classloader. loadclass (unknown source)
At sun. Misc. launcher $ appclassloader. loadclass (unknown source)
At java. Lang. classloader. loadclass (unknown source)
At java. Lang. classloader. loadclassinternal (unknown source)
... 2 more
This is onemainjia15679 comment on reply more comment # Re: 10 minutes to use ICTCLAS Java Hello Jasper
I am a newbie and ask why I use ictclas4j
There are many compilation errors in org. ictclas4j. Bean under SRC.
Most of them are
Reflectiontostringbuilder cannot be resolved
And
The import org. Apache cannot be resolved

I wonder if you have tested whether the open-source shared version provides a comprehensive dictionary?
I just want to use it for word segmentation.
Thank you very much ~ Miao comments: reply to more comments # Re: Use ICTCLAS Java edition [not logged on] @ Miao in 10 minutes
Check your JDK version to ensure that JDK 5 is later than JDK 5. in addition, I have not tested ICTCLAS for non-open-source versions. However, according to their documents and ictclas4j authors, non-open-source version dictionaries are much more comprehensive and faster. Jasper's comment was posted at. More Comments # Re: 10 minutes to use ICTCLAS Java edition hi Jasper. Thank you very much.

I installed JDK 6/JRE/Eclipse/XP.
I have carefully read that all errors are related to reflectiontostringbuilder.
Shocould be from:
Import org. Apache. commons. Lang. Builder. reflectiontostringbuilder

However, I do not have org. Apache
I don't know when to install it or do I need to install the jar on my own ?? Miao comments: reply to more comments at pm on # Re: Use ICTCLAS Java edition [not logged in] @ Miao
A jar package of Apache commons can be found at www.apache.org. Jasper commented on to reply to more comments # Re: Start Using ICTCLAS Java in 10 minutes. Thank you for choosing... It's from Google.
I went to apache.org, but I don't know how to find the jar I want.
The very long directory on the download page...
How should I find commons?

Should this have been manually installed?

What is the relationship between APACHE and Apache HTTP server?

Thank you very much for the great comrade Jasper ~~

Miao comments: reply to more comments at on # Re: Use ICTCLAS Java version hi Jasper ~ in 10 minutes ~ (Again; P)
Sorry, I have to ask you again ~
After I run your test class
This is onemain
Exception in thread "Main" Java. Lang. nullpointerexception
At org. ictclas4j. Bean. dictionary. getmaxmatch (dictionary. Java: 571)
At org. ictclas4j. segment. graphgenerate. Generate (graphgenerate. Java: 93)
At org. ictclas4j. segment. segtag. Split (segtag. Java: 63)
At onemain. Main (onemain. Java: 13)

The file location should be correct.
I tried to remove the text from the text and then run normally only when the English language is left.
In English XP eclipse, the encoding of text is changed to UTF-8 in preference.
I don't know what the problem is? Thank you very much ~ Miao comments: at, I will reply to more comments # Re: 10 minutes after I changed ICTCLAS Java to GBK, I am very grateful for my good mood ~ Miao comments: reply to more comments at on # Re: Start Using ICTCLAS Java in 10 minutes. Are you a student of Miss Liu Qun? Rubby comments: reply to more comments # Re: Use ICTCLAS Java version [not logged on] @ Rubby in 10 minutes
No. Jasper commented on to reply to more comments # Re: 10 minutes to use ICTCLAS Java [not logged in] Hello ~~
Why is the import package always wrong when I run it?

Exception in thread "Main" Java. Lang. Error: unresolved compilation problems:
Segtag cannot be resolved to a Type
Segtag cannot be resolved to a Type
Segresult cannot be resolved to a Type
The Import Statement has always been faulty.

# Re: Use ICTCLAS Java version [not logged on] hiswing comments at to reply to more comments

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.