Elasticsearch Learning Primer

Source: Internet
Author: User

In the past few years, the development threshold of the search is getting lower, each language has the open Source Retrieval toolkit, and the function is more and more full, complete solution also more and more, more and more use, for example Lucene on has SOLR, Elasticsearch, Sensei and so on. They cover most of the needs, freeing up the development effort directly on the Search toolkit, allowing people to focus more on business development. Personal comparison optimistic Elasticsearch (es), ES is very simple to use, let people feel more in use a nosql, and allow a lot of plug-in features can be developed by themselves. We can easily test es through the rest client, so it's easy to learn.

ES's official website has a more comprehensive API, but I have seen the level of the API is still a bit messy, at least without MongoDB documents so easy to read. Get to know es from a simple application. For example, to build a Chinese news search engine, the news has "title", "Content", "Author", "Type", "Release Time" of the five fields; we want to provide "title and content Search", "sorting", "highlighting", "statistics", "filtering" and other basic functions. ES provides SMARTCN's Chinese word-breaker, which is recommended for use with IK word breakers, or for examples given by plugin authors.

Download and install the plugin, start es, then you can start the ES experience.

1. Create an index named Test

PUT Http://localhost:9200/test

2. Create mapping

POST http://localhost:9200/test/news/_mapping

The contents are:

  1. {
  2. "News": {
  3. "Properties": {
  4. "Content": {
  5. ' Type ': ' String ',
  6. "Store": "No",
  7. "Term_vector": "With_positions_offsets",
  8. "Index_analyzer": "Ik",
  9. "Search_analyzer": "IK"
  10. }
  11. ,
  12. "title": {
  13. ' Type ': ' String ',
  14. "Store": "No",
  15. "Term_vector": "With_positions_offsets",
  16. "Index_analyzer": "Ik",
  17. "Search_analyzer": "Ik",
  18. "Boost": 5
  19. }
  20. ,
  21. "Author": {
  22. "Type": "string", "index": "Not_analyzed"
  23. }
  24. ,
  25. "Publish_date": {
  26. "Type": "Date", "format": "Yyyy/mm/dd", "index": "Not_analyzed"
  27. }
  28. ,
  29. "category": {
  30. "Type": "string", "index": "Not_analyzed"
  31. }
  32. }
  33. }
  34. }
Copy Code


The URL of the test, news in ES is the index/type, the feeling is the corresponding database like library name/table name of the relationship, post content in the properties corresponding to the contents of mapping, inside 5 fields. Type indicates the field type, content, Title field to be participle and highlight so set the word breaker and turn on Term_vector. Specific types of APIs can be seen here.

3. Manufacture and submit some data:

POST http://localhost:9200/test/news/

The content is made up of several articles:

    1. {
    2. "Content": "China's premier visits Europe",
    3. "title": "The U.S. economic situation is good this year",
    4. "Publish_date": "2010/07/01",
    5. "Author": "Zhang San",
    6. "Category": "Finance"
    7. }
Copy Code


4. Search

POST Http://localhost:9200/test/news/_search

The content includes several parts:

Page: from/size, Field: Fields, sort, query: queries, filtering: Filter, highlighting: Highlight, Statistics: facets

  1. {
  2. "From": 0,
  3. "Size": 10,
  4. "Fields": ["title", "Content", "Publish_date", "category", "author"],
  5. "Sort": [
  6. {"Publish_date": {"Order": "ASC"}},
  7. "_score"
  8. ],
  9. "Query": {
  10. "BOOL": {
  11. "Should": [
  12. {
  13. "term": {"title": "China"}
  14. },
  15. {
  16. "term": {"Content": "China"}
  17. }
  18. ]}
  19. },
  20. "Filter": {
  21. "Range": {
  22. "Publish_date": {
  23. "From": "2010/07/01",
  24. "To": "2010/07/21",
  25. "Include_lower": true,
  26. "Include_upper": false
  27. }
  28. }
  29. },
  30. "Highlight": {
  31. "Pre_tags": ["<tag1>", "<tag2>"],
  32. "Post_tags": ["</tag1>", "</tag2>"],
  33. "Fields": {
  34. "title": {},
  35. "Content": {}
  36. }
  37. },
  38. "Facets": {
  39. "Cate": {"Terms": {"field": "Category"}}
  40. }
  41. }
  42. Here query I chose to construct a title or content containing "China",
  43. The results are as follows:
  44. {
  45. "Took": 18,
  46. "Timed_out": false,
  47. "_shards": {
  48. "Total": 5,
  49. "Successful": 5,
  50. "Failed": 0
  51. },
  52. "Hits": {
  53. "Total": 6,
  54. "Max_score": null,
  55. "Hits": [{
  56. "_index": "Test",
  57. "_type": "News",
  58. "_id": "_fm13zccsnwatpouziok_a",
  59. "_score": 0.024621923,
  60. "Fields": {
  61. "CONTENT": "Reuters reported how the United States in Iraq yesterday, how China will face the situation." ",
  62. "Author": "Zhang San",
  63. "title": "The United States left Iraq is a mess",
  64. "Category": "Politics",
  65. "Publish_date": "2010/07/10"
  66. },
  67. "Highlight": {
  68. "Content": ["Reuters reported how the United States in Iraq yesterday how to,<tag2> China </tag2> will face the situation." " ]
  69. },
  70. "Sort": [1278720000000, 0.024621923]
  71. }, {
  72. "_index": "Test",
  73. "_type": "News",
  74. "_id": "4fey1t6-rmomojyts4foaq",
  75. "_score": 0.024621923,
  76. "Fields": {
  77. "Content": "China will visit the northern part of Russia Today," the Associated Press reported. ",
  78. "Author": "Zhang San",
  79. "title": "American Visit to Russia",
  80. "Category": "Politics",
  81. "Publish_date": "2010/07/11"
  82. },
  83. "Highlight": {
  84. "Content": ["AP reports,<tag2> China </tag2> will visit Russia's northern region today. " ]
  85. },
  86. "Sort": [1278806400000, 0.024621923]
  87. }, {
  88. "_index": "Test",
  89. "_type": "News",
  90. "_id": "Ll-8bzcntt2yekqs1owc_a",
  91. "_score": 0.61871845,
  92. "Fields": {
  93. "Content": "U.S. Secretary of State Hillary Clinton told reporters that most of South Korea's Seoul Jiangnan District are willing to move to North Korea." ",
  94. "Author": "John Doe",
  95. "title": "China's economy will face a downside risk",
  96. "category": "Economy",
  97. "Publish_date": "2010/07/12"
  98. },
  99. "Highlight": {
  100. "title": ["<tag2> China </tag2> economy will face downside risk"]
  101. },
  102. "Sort": [1278892800000, 0.61871845]
  103. }, {
  104. "_index": "Test",
  105. "_type": "News",
  106. "_id": "dnb6gtpsraoexc1axcpz0q",
  107. "_score": 0.048311904,
  108. "Fields": {
  109. "Content": "China's premier talks about domestic sports, the gold medal is undoubtedly the most important, the United States",
  110. "Author": "Zhang San",
  111. "title": "The National System of American Sports",
  112. "Category": "Sports",
  113. "Publish_date": "2010/07/14"
  114. },
  115. "Highlight": {
  116. "Content": ["<tag2> China </tag2> Premier talks about domestic sports, gold medal is undoubtedly the most important, the United States"]
  117. },
  118. "Sort": [1279065600000, 0.048311904]
  119. }, {
  120. "_index": "Test",
  121. "_type": "News",
  122. "_id": "4lh55yoaqve7cariyvfynq",
  123. "_score": 0.048311904,
  124. "Fields": {
  125. "Content": "China's military threat to South-East Asia will continue, Russia is closely concerned about developments",
  126. "Author": "Zhang San",
  127. "title": "National efforts to develop education",
  128. "Category": "Politics",
  129. "Publish_date": "2010/07/15"
  130. },
  131. "Highlight": {
  132. "Content": ["<tag2> China </tag2> military threat to South-East Asia will continue, Russia closely monitor developments"]
  133. },
  134. "Sort": [1279152000000, 0.048311904]
  135. }, {
  136. "_index": "Test",
  137. "_type": "News",
  138. "_id": "HFOVOBDCTI-2ERYAQV-AKG",
  139. "_score": 0.12422675,
  140. "Fields": {
  141. "Content": "China's economic rebound is weak, the U.S. economy continues to slump, other economies are unreliable",
  142. "Author": "John Doe",
  143. "title": "Europe's debt crisis sweeping the world",
  144. "category": "Economy",
  145. "Publish_date": "2010/07/19"
  146. },
  147. "Highlight": {
  148. "Content": ["<tag2> China </tag2> economic rebound weak, the U.S. economy continues to slump, other economies are unreliable"]
  149. },
  150. "Sort": [1279497600000, 0.12422675]
  151. } ]
  152. },
  153. "Facets": {
  154. "Cate": {
  155. "_type": "Terms",
  156. "Missing": 0,
  157. "Total": 10,
  158. "Other": 0,
  159. "Terms": [{
  160. "Term": "Politics",
  161. "Count": 4
  162. }, {
  163. "term": "Economy",
  164. "Count": 3
  165. }, {
  166. "term": "Sport",
  167. "Count": 3
  168. } ]
  169. }
  170. }
  171. }
Copy Code


The results contain several parts that are needed. It should be noted that facet statistics is the result of the hit statistics, filter is the result of filtering, filter does not affect the facet, if you want to count the filter out of the use of the filter facet.

ES provides a lot of functionality, and the combination of JSON is messy than the usual syntax for SQL, but with a combination that can satisfy a variety of complex applications, many of the settings that affect performance are worth looking at. For developers, it is also necessary to use the built-in API interface for two development, such as developing your own participle, synchronization with other libraries, and so on. ES is still in 0.19.9 but there have been many official trainings, so the future should be limitless, and it may be the first choice for the search system solution.

Elasticsearch Learning Primer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.