Elasticsearch-Getting started with search engines

Source: Internet
Author: User

Elasticsearch is a distributed and extensible real-time search and analysis engine. It can help you search, analyze, and browse data, and often people don't anticipate the need for these features at the start of a project. Elasticsearch's appearance is to re-give the hard disk seemingly useless raw data new vitality.

Elasticsearch each individual part is not a new creation. For example, full-text search has long been implemented, statistical systems and distributed databases have already existed. But the revolution lies in the ability to combine these independent functions into a coherent, real-time, holistic approach. For new users, it is also very low threshold, of course, he will be because of your strong and become more powerful.

Unfortunately, most of the current databases are very weak in extracting data. Although they can filter content by precise timestamps or exact values, can they do synonyms or relevance searches in full-text search? Can they summarize the same content data? Most important of all, can they be processed in real time for such a large amount of data?

This is why Elasticsearch is so prominent: Elasticsearch can help you navigate and exploit the very difficult data that is already in the database.

Understanding Search

Elasticsearch is a search engine based on the full-text search engine Apache Lucene (TM), which can be said that Lucene is the most advanced and efficient full-featured open source search engine framework today. But Lucene is just a framework, and to take full advantage of its functionality, you need to use Java and integrate Lucene into your program. What's worse, you need to do a lot of learning to understand how it works, and Lucene is really complicated.

Elasticsearch uses Lucene as an internal engine, but when you use it for full-text search, you only need to use a unified development API, without having to understand how the complex lucene behind it works. Of course elasticsearch is not just lucene so simple, it includes not only full-text search function, but also can do the following work:
Distributed real-time file storage, and each of the fields are indexed so that they can be searched.
Distributed search engine for real-time analysis.
Can scale to hundreds of servers, processing petabytes of structured or unstructured data.

With so many features integrated into a single server, you can easily communicate with ES's RESTful API via the client or any of your favorite programming languages, with the default port of 9200, which can be modified in the configuration file.

Document oriented

The object in the program is rarely a list of simple key values and numeric values. More often it has a complex structure, including dates, geographic locations, objects, arrays, and so on.
Sooner or later you will store these objects in the database. You're going to try to put all this rich and huge data into a relational database of rows and columns, and then you have to adjust the data according to the format of each field, and then each time you rebuild it, you retrieve the data again.


Elasticsearch is a document-oriented database, which means that it stores the entire object or document, not only storing them, but also indexing them so you can search for them. You can index, search, sort, and filter these documents in Elasticsearch. No rows of data are required. This will be a completely different way of thinking about the data, which is why Elasticsearch can perform complex full-text searches.

Elasticsearch uses JSON (or JavaScript Object Notation) as the format for document serialization. JSON has been supported by most languages and has become a standard format in the NoSQL world. It is simple, concise and easy to read. In Elasticsearch, it is much easier to convert an object to JSON and as an index than to do the same thing in a table structure.

es can manipulate data via curl in the form of get post delete, or through match matching, filter filter, Range range query, Boolean query, aggregations (instead of facet) aggregation, etc. , the official website (https://www.elastic.co/guide/index.html) has a rich sample of queries. Similarly, it supports bulk API bulk queries, reducing network round trips.

Full-Text Search

A feature that is difficult to implement in a traditional database. We will search all employees who like rock climbing:

get/megacorp/employee/_search{"Query": {"match": {"about": "Rock Climbing"}}

You will find that we also used the match query to search for rock climbing in the About field. We will get two matching documents:

{... "hits": {"Total": 2, "Max_score": 0.16273327, "hits": [{... "_score": 0.16273327, <1> "_source": {"first_name": "John", "last_name": "Smith", "age": +, "about": "I love to go rock climbing", "interests": ["Sports", "Music"]}},{... "_sco Re ": 0.016878016, <1>" _source ": {" first_name ":" Jane "," last_name ":" Smith "," age ": +," about ":" I like to collect RO CK albums "," Interests ": [" Music "]}}]}

1. Related ratings
Typically, elasticsearch are sorted by relevance, and in the first result, John Smith's about field is explicitly written to rock climbing. In Jane Smith's about field, the rock was mentioned, but the climbing was not mentioned, so the latter _score was lower than the former. The so-called correlation (a relative measure of how much matches a given search query, the higher the score, the more relevant the document)

In addition, the calculation of the score results in a certain performance loss, and you can use the filter filter when you do not need to calculate the score. This example is a good way to explain how Elasticsearch performs full-text search. For Elasticsearch, the correlation is important, and this is the biggest difference it has with traditional databases when it comes back to matching data.

Paragraph search

It's good to be able to find independent words in each field, but sometimes you may need to match exact phrases or paragraphs. For example, we only need to query employees for the About field that contains only rock climbing phrases. To achieve this, we will match the match query into a match_phrase query.


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Elasticsearch-Getting started with search engines

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.