In the past few years, the development threshold of the search is getting lower, each language has the open Source Retrieval toolkit, and the function is more and more full, complete solution also more and more, more and more use, for example Lucene on has SOLR, Elasticsearch, Sensei and so on. They cover most of the needs, freeing up the development effort directly on the Search toolkit, allowing people to focus more on business development. Personal comparison optimistic Elasticsearch (es), ES is very simple to use, let people feel more in use a nosql, and allow a lot of plug-in features can be developed by themselves. We can easily test es through the rest client, so it's easy to learn.
ES's official website has a more comprehensive API, but I have seen the level of the API is still a bit messy, at least without MongoDB documents so easy to read. Get to know es from a simple application. For example, to build a Chinese news search engine, the news has "title", "Content", "Author", "Type", "Release Time" of the five fields; we want to provide "title and content Search", "sorting", "highlighting", "statistics", "filtering" and other basic functions. ES provides SMARTCN's Chinese word-breaker, which is recommended for use with IK word breakers, or for examples given by plugin authors.
Download and install the plugin, start es, then you can start the ES experience.
1. Create an index named Test
PUT Http://localhost:9200/test
2. Create mapping
POST http://localhost:9200/test/news/_mapping
The contents are:
- {
- "News": {
- "Properties": {
- "Content": {
- ' Type ': ' String ',
- "Store": "No",
- "Term_vector": "With_positions_offsets",
- "Index_analyzer": "Ik",
- "Search_analyzer": "IK"
- }
- ,
- "title": {
- ' Type ': ' String ',
- "Store": "No",
- "Term_vector": "With_positions_offsets",
- "Index_analyzer": "Ik",
- "Search_analyzer": "Ik",
- "Boost": 5
- }
- ,
- "Author": {
- "Type": "string", "index": "Not_analyzed"
- }
- ,
- "Publish_date": {
- "Type": "Date", "format": "Yyyy/mm/dd", "index": "Not_analyzed"
- }
- ,
- "category": {
- "Type": "string", "index": "Not_analyzed"
- }
- }
- }
- }
Copy Code
The URL of the test, news in ES is the index/type, the feeling is the corresponding database like library name/table name of the relationship, post content in the properties corresponding to the contents of mapping, inside 5 fields. Type indicates the field type, content, Title field to be participle and highlight so set the word breaker and turn on Term_vector. Specific types of APIs can be seen here.
3. Manufacture and submit some data:
POST http://localhost:9200/test/news/
The content is made up of several articles:
- {
- "Content": "China's premier visits Europe",
- "title": "The U.S. economic situation is good this year",
- "Publish_date": "2010/07/01",
- "Author": "Zhang San",
- "Category": "Finance"
- }
Copy Code
4. Search
POST Http://localhost:9200/test/news/_search
The content includes several parts:
Page: from/size, Field: Fields, sort, query: queries, filtering: Filter, highlighting: Highlight, Statistics: facets
- {
- "From": 0,
- "Size": 10,
- "Fields": ["title", "Content", "Publish_date", "category", "author"],
- "Sort": [
- {"Publish_date": {"Order": "ASC"}},
- "_score"
- ],
- "Query": {
- "BOOL": {
- "Should": [
- {
- "term": {"title": "China"}
- },
- {
- "term": {"Content": "China"}
- }
- ]}
- },
- "Filter": {
- "Range": {
- "Publish_date": {
- "From": "2010/07/01",
- "To": "2010/07/21",
- "Include_lower": true,
- "Include_upper": false
- }
- }
- },
- "Highlight": {
- "Pre_tags": ["<tag1>", "<tag2>"],
- "Post_tags": ["</tag1>", "</tag2>"],
- "Fields": {
- "title": {},
- "Content": {}
- }
- },
- "Facets": {
- "Cate": {"Terms": {"field": "Category"}}
- }
- }
- Here query I chose to construct a title or content containing "China",
- The results are as follows:
- {
- "Took": 18,
- "Timed_out": false,
- "_shards": {
- "Total": 5,
- "Successful": 5,
- "Failed": 0
- },
- "Hits": {
- "Total": 6,
- "Max_score": null,
- "Hits": [{
- "_index": "Test",
- "_type": "News",
- "_id": "_fm13zccsnwatpouziok_a",
- "_score": 0.024621923,
- "Fields": {
- "CONTENT": "Reuters reported how the United States in Iraq yesterday, how China will face the situation." ",
- "Author": "Zhang San",
- "title": "The United States left Iraq is a mess",
- "Category": "Politics",
- "Publish_date": "2010/07/10"
- },
- "Highlight": {
- "Content": ["Reuters reported how the United States in Iraq yesterday how to,<tag2> China </tag2> will face the situation." " ]
- },
- "Sort": [1278720000000, 0.024621923]
- }, {
- "_index": "Test",
- "_type": "News",
- "_id": "4fey1t6-rmomojyts4foaq",
- "_score": 0.024621923,
- "Fields": {
- "Content": "China will visit the northern part of Russia Today," the Associated Press reported. ",
- "Author": "Zhang San",
- "title": "American Visit to Russia",
- "Category": "Politics",
- "Publish_date": "2010/07/11"
- },
- "Highlight": {
- "Content": ["AP reports,<tag2> China </tag2> will visit Russia's northern region today. " ]
- },
- "Sort": [1278806400000, 0.024621923]
- }, {
- "_index": "Test",
- "_type": "News",
- "_id": "Ll-8bzcntt2yekqs1owc_a",
- "_score": 0.61871845,
- "Fields": {
- "Content": "U.S. Secretary of State Hillary Clinton told reporters that most of South Korea's Seoul Jiangnan District are willing to move to North Korea." ",
- "Author": "John Doe",
- "title": "China's economy will face a downside risk",
- "category": "Economy",
- "Publish_date": "2010/07/12"
- },
- "Highlight": {
- "title": ["<tag2> China </tag2> economy will face downside risk"]
- },
- "Sort": [1278892800000, 0.61871845]
- }, {
- "_index": "Test",
- "_type": "News",
- "_id": "dnb6gtpsraoexc1axcpz0q",
- "_score": 0.048311904,
- "Fields": {
- "Content": "China's premier talks about domestic sports, the gold medal is undoubtedly the most important, the United States",
- "Author": "Zhang San",
- "title": "The National System of American Sports",
- "Category": "Sports",
- "Publish_date": "2010/07/14"
- },
- "Highlight": {
- "Content": ["<tag2> China </tag2> Premier talks about domestic sports, gold medal is undoubtedly the most important, the United States"]
- },
- "Sort": [1279065600000, 0.048311904]
- }, {
- "_index": "Test",
- "_type": "News",
- "_id": "4lh55yoaqve7cariyvfynq",
- "_score": 0.048311904,
- "Fields": {
- "Content": "China's military threat to South-East Asia will continue, Russia is closely concerned about developments",
- "Author": "Zhang San",
- "title": "National efforts to develop education",
- "Category": "Politics",
- "Publish_date": "2010/07/15"
- },
- "Highlight": {
- "Content": ["<tag2> China </tag2> military threat to South-East Asia will continue, Russia closely monitor developments"]
- },
- "Sort": [1279152000000, 0.048311904]
- }, {
- "_index": "Test",
- "_type": "News",
- "_id": "HFOVOBDCTI-2ERYAQV-AKG",
- "_score": 0.12422675,
- "Fields": {
- "Content": "China's economic rebound is weak, the U.S. economy continues to slump, other economies are unreliable",
- "Author": "John Doe",
- "title": "Europe's debt crisis sweeping the world",
- "category": "Economy",
- "Publish_date": "2010/07/19"
- },
- "Highlight": {
- "Content": ["<tag2> China </tag2> economic rebound weak, the U.S. economy continues to slump, other economies are unreliable"]
- },
- "Sort": [1279497600000, 0.12422675]
- } ]
- },
- "Facets": {
- "Cate": {
- "_type": "Terms",
- "Missing": 0,
- "Total": 10,
- "Other": 0,
- "Terms": [{
- "Term": "Politics",
- "Count": 4
- }, {
- "term": "Economy",
- "Count": 3
- }, {
- "term": "Sport",
- "Count": 3
- } ]
- }
- }
- }
Copy Code
The results contain several parts that are needed. It should be noted that facet statistics is the result of the hit statistics, filter is the result of filtering, filter does not affect the facet, if you want to count the filter out of the use of the filter facet.
ES provides a lot of functionality, and the combination of JSON is messy than the usual syntax for SQL, but with a combination that can satisfy a variety of complex applications, many of the settings that affect performance are worth looking at. For developers, it is also necessary to use the built-in API interface for two development, such as developing your own participle, synchronization with other libraries, and so on. ES is still in 0.19.9 but there have been many official trainings, so the future should be limitless, and it may be the first choice for the search system solution.
Elasticsearch Learning Primer