First, the application of pinyin participle
Pinyin participle in the daily life is actually very common, maybe you use every day. Open Taobao look at it, input pinyin "Zhonghua", the following will contain "Zhonghua" corresponding to the Chinese "Chinese" Product tips:
Pinyin segmentation is based on the input of phonetic prompts corresponding to the Chinese, through pinyin word to enhance the search experience, speed up the search speed. The following describes how to configure and implement Pinyin+ik participle in elasticsearch 5.1.1. Two, ik word breaker download and installation
About IK word breaker is no longer how much, word, ik participle is currently using a very wide range of Chinese word segmentation effect is better. To do ES development, Chinese participle of ten uses are IK word breaker.
Download Address: Https://github.com/medcl/elasticsearch-analysis-ik
Turn off Elasticsearch before configuring and reboot after configuration is complete.
The version of IK is consistent with the current ES version, as described in the readme. I am using the ES is 5.1.1,ik version for 5.1.1 (You may wonder why the previous version of IK is 1.X and the next version rises to 5.) X? Because elastic official for the unified version number, the previous ES version is 2.x,logstash version is 2.x, and Kibana version is 4.x,ik version is 1.x, so the version is very confusing. After 5.0, the unified version number, so that you use 5.1.1 es, other software versions also use the 5.1.1 is good.
After downloading into the Elasticsearch-analysis-pinyin-master directory, MVN pack (without installing Maven's own installation), run the command:
MVN Package
1
After the package is successful, a target folder is generated, in the Elasticsearch-analysis-ik-master/target/releases directory, Find Elasticsearch-analysis-ik-5.1.1.zip, this is the installation file we need. Extract the Elasticsearch-analysis-ik-5.1.1.zip and get the following content:
Commons-codec-1.9.jar
Commons-logging-1.2.jar
config
elasticsearch-analysis-ik-5.1.1.jar
Httpclient-4.5.2.jar
Httpcore-4.4.4.jar
plugin-descriptor.properties
1 2 3 4 5 6 7
Then create a new folder IK in the Elasticsearch-5.1.1/plugins directory, and copy the Elasticsearch-analysis-ik-5.1.1.zip files to ELASTICSEARCH-5.1.1/ Plugins/ik directory. The screenshot is easy to understand.
Third, pinyin word breaker download and installation
Pinyin word breaker Download address:
Https://github.com/medcl/elasticsearch-analysis-pinyin
The installation process, like IK, downloads, packs, and joins ES. This does not repeat the above steps, give the final configuration screenshot
Four, Word segmentation test
After the IK and pinyin participle configuration is complete, restart ES. If an ES error occurs during the reboot, the installation has errors, and no error indicates that the configuration was successful. 4.1 IK participle test
To create an index:
Curl-xput "Http://localhost:9200/index"
1
Test participle effect:
Curl-xpost "Http://localhost:9200/index/_analyze?analyzer=ik_max_word&text= People's Republic of China"
1
Word Segmentation Result:
{"Tokens": [{"token": "People's Republic of China", "Start_offset": 0, "End_offset": 7, "type": "Cn_
WORD, "position": 0}, {"token": "Chinese People", "Start_offset": 0, "End_offset": 4, "Type": "Cn_word", "Position": 1}, {"token": "China", "Start_offset": 0, "End_offset": 2, "type": "Cn_word", "Position": 2}, {"token": "Chinese", "Start_offset": 1, "E Nd_offset ": 3," type ":" Cn_word "," Position ": 3}, {" token ":" People's Republic "," Start_offset ": 2, "End_offset": 7, "type": "Cn_word", "Position": 4}, {"token": "People", "STA
Rt_offset ": 2," End_offset ": 4," type ":" Cn_word "," Position ": 5}, {" token ":" Republic ", "Start_offset": 4, "End_offset": 7, "type": "Cn_word", "Position": 6}, {"Tok En ":" The Republic "," starT_offset ": 4," End_offset ": 6," type ":" Cn_word "," Position ": 7}, {" token ":" Country ", "Start_offset": 6, "End_offset": 7, "type": "Cn_char", "Position": 8}, {"token" : "National anthem", "Start_offset": 7, "End_offset": 9, "type": "Cn_word", "Position": 9}]}1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26-27--28 29---30 31--32 33 34 35 36 37 38-39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Use Ik_smart participle:
Curl-xpost "Http://localhost:9200/index/_analyze?analyzer=ik_smart&text= People's Republic of China"
1
Word Segmentation Result:
{"
tokens": [{"
token": "People's Republic of China",
"Start_offset": 0,
"End_offset": 7, "
type": "Cn_word",
" Position ": 0
}, {
" token ":" National anthem ",
" Start_offset ": 7,
" End_offset ": 9,"
type ":" Cn_word ",
" Position ': 1
}]
}
1 2 3 4, 5 6 7 8 9 10 11 12 13 14 15
Screenshot easy to understand:
4.2 Phonetic Word segmentation test
Test Pinyin participle:
Curl-xpost "http://localhost:9200/index/_analyze?analyzer=pinyin&text= Jacky Cheung"
1
Word Segmentation Result:
{"
tokens": [{
"token": "Zhang",
"Start_offset": 0,
"end_offset": 1, "
type": "word",
" Position ": 0
}, {
" token ":" Xue ",
" Start_offset ": 1,
" End_offset ": 2,"
type ":" word ",
" Position ": 1
}, {
" token ":" You ",
" Start_offset ": 2,
" End_offset ": 3,"
type ":" word ",
"Position": 2
}, {
"token": "Zxy",
"Start_offset": 0,
"End_offset": 3, "
type": "word",
"position": 3
}]
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 275, Ik+pinyin participle configuration 5.1 Create index and Analyzer settings
Create an index and set the index parser-related properties:
Curl-xput "http://localhost:9200/medcl/"-d '
{
"index": {"Analysis
": {"
Analyzer":
{"Ik_ Pinyin_analyzer ": {"
type ":" Custom ",
" Tokenizer ":" Ik_smart ",
" filter ": [" My_pinyin "," Word_delimiter " ]
}
,
"filter": {
"My_pinyin": {"
type": "Pinyin",
"first_letter": "prefix",
" Padding_char ":" "}}}}
'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20-21
Create a type and set mapping:
Curl-xpost http://localhost:9200/medcl/folks/_mapping-d '
{
"folks": {"
properties": {
"name": {
' type ': ' keyword ',
' fields ': {'
Pinyin ': {'
type ': ' Text ',
' store ': ' No ',
' term_vector ': ' With_ Positions_offsets ",
Analyzer": "Ik_pinyin_analyzer",
"Boost":}
}
}'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 5.2 Index test document
Index 2 test documents.
Document 1:
Curl-xpost http://localhost:9200/medcl/folks/andy-d ' {' name ': ' Andy Lau '} '
1
Document 2:
Curl-xpost http://localhost:9200/medcl/folks/tina-d ' {"name": "The national anthem of the People's Republic of China"} '
1 5.3 Test (1) Pinyin participle
The following four life commands can match "Andy Lau"
Curl-xpost "Http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu"
curl-xpost "http://localhost:9200/ Medcl/folks/_search?q=name.pinyin:de "
curl-xpost" Http://localhost:9200/medcl/folks/_search?q=name.pinyin: Hua "
curl-xpost" HTTP://LOCALHOST:9200/MEDCL/FOLKS/_SEARCH?Q=NAME.PINYIN:LDH "
1 2 3 4 5 6 7 5.4 Test (2) IK participle test
Curl-xpost "Http://localhost:9200/medcl/_search?pretty"-d '
{
"query": {"
match": {"
Name.pinyin": "National anthem"
}
},
"highlight": {"
fields": {
"Name.pinyin": {
}
}}
1 2 3 4 5 6 7 8 9 10 11 12-13
return Result:
{
"took": 2,
"Timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
} ,
"hits": {
"total": 1,
"Max_score": 16.698704,
"hits": [
{"
_index": "MEDCL",
"_ Type ":" Folks ","
_id ":" Tina ",
" _score ": 16.698704,"
_source ": {
" name ":" The national anthem of the People's Republic of China "
},< c19/> "Highlight": {"
name.pinyin": [
"<em> People's Republic of China </em><em> national anthem </em>"
]
}
}
]
}
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28-29
Indicates that the IK word breaker has an effect. 5.3 Test (4) Pinyin+ik participle test:
Curl-xpost "Http://localhost:9200/medcl/_search?pretty"-d '
{
"query": {
"match": {
" Name.pinyin ":" Zhonghua "
}
},
" highlight ": {" Fields ": {"
Name.pinyin ": {}
}} '
1 2 3 4 5 6 7 8 9 10 11 12-13
return Result:
{
"took": 3,
"Timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
} ,
"hits": {
"total": 2,
"Max_score": 5.9814634,
"hits": [
{"
_index": "MEDCL",
"_ Type ":" Folks ","
_id ":" Tina ",
" _score ": 5.9814634,"
_source ": {
" name ":" The national anthem of the People's Republic of China "
},< c19/> "Highlight": {"
name.pinyin": [
"<em> People's Republic of China </em> national anthem"
]
}
},
{
"_index": "MEDCL", "
_type": "Folks",
"_id": "Andy",
"_score": 2.2534127,
"_source": {
"name": "Andy Lau"
},
"highlight": {
"Name.pinyin": [
"<em> Andy Lau </em>"
]
}
}
]
}
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 A.
Screenshot below:
After using pinyin participle, the original field search is added with the. Pinyin suffix, and the search for the original field does not return results:
Vi. Reference Https://github.com/medcl/elasticsearch-analysis-ik Https://github.com/medcl/elasticsearch-analysis-pinyin https://my.oschina.net/xiaohui249/blog/214505