Chinese word segmentation in two ways: Rwordseg and Jiebar
Environment configuration for the R language:
R_path:
C:\Program files\r\r-3.1.2
Path:
%r_path%
First, the Chinese word segmentation with rwordseg bag
(1) Configure the environment variables for Java:
Java_home:
C:\Program files\java\jdk1.8.0_31
Path:
%java_home%\bin;%java_home%\jre\bin
CLASSPATH:
%java_home%\lib\dt.jar;%java_home%\lib\tools.jar
(2) download RWORDSEG package to local hard drive, current version of RWORDSEG package in https://r-forge.r-project.org/R/?group_id=1054
1 > Install.packages ("Rjava")
2 > Add the following path to the PATH environment variable:
? %java_home%\jre\bin
? %java_home%\jre\bin\server
? %r_path%\library\rjava\jri
3 > Install.packages (" download good rwordseg Package folder address /rwordseg_0.2-1.zip", repos=null,type= "source")
(3) Enter the command:
1 > Library ("Rjava")
2 > Library ("Rwordseg")
3 > Words = "Sanitation workers are dismissed for warming in the cold wind"
4 > segment.options (isnamerecognition = TRUE) #打开人名识别
5 > segmentcn (words)
Operation Result:
[1] "sanitation" "work" "because" "in" "Cold Wind" "in" "Warm Fire" "heating" "was" "dismissed"
Change to words = "My name is R language"
Operation Result: [1] "I" "" Name "" is "" R language "
Second, the Chinese word segmentation with Jiebar bag
(1) Enter the command:
1 > Install.packages ("Jiebar") #安装jiebaR包
2 > Library ("Jiebard") #加载jiebaRD包
3 > Library ("Jiebar")
4 > Words = "Sanitation workers are dismissed for warming in the cold wind"
5 > Test = Worker ()
6 > Test <= words
(2) output result:
[1] "sanitation workers" "because in" "Cold Wind" "in" "Warm Fire" "heating" "was" "dismissed"
Replace words = "My name is R language"
Operation Result: [1] "I" "" Name "" is "" R "" Language "
R Language for Chinese word segmentation