Full-text search engine Elasticsearch Getting Started tutorial

Source: Internet
Author: User
Tags create index curl json zip

It can quickly store, search, and analyze massive amounts of data. It is used by Wikipedia, Stack Overflow, and Github.
The bottom of the Elastic is the Open Source Library Lucene. However, you cannot use Lucene directly, you must write your own code to invoke its interface. The Elastic is a Lucene package that provides the operating interface of the REST API and is available out of the box.
This article starts from scratch and explains how to use Elastic to build your own full-text search engine. Each step has a detailed explanation, we can learn with the follow-up.
First, installation
Elastic requires a Java 8 environment. If your machine has not yet installed Java, you can refer to this article to ensure that the environment variable java_home is set correctly.
After installing Java, you can follow the official documentation to install Elastic. Downloading a compressed package directly is straightforward.

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip
$ unzip Elasticsearch-5.5.1.zip

Next, go to the extracted directory, run the following command, start Elastic.

$./bin/elasticsearch

If this times wrong "max virtual memory Areas Vm.maxmapcount [65530] is too low", run the following command.

$ sudo sysctl-w vm.max_map_count=262144

If everything works, Elastic will run on the default port of 9200. At this point, open another command-line window, requesting the port, and you will get the description information.

$ curl localhost:9200

{
  "name": "Atntrtf",
  "cluster_name": "Elasticsearch",
  "Cluster_uuid": " Tf9250xhq6ee4h7yi11ana ",
  " version ": {
    " number ":" 5.5.1 ",
    " Build_hash ":" 19c13d0 ",
    " build_date ":" 2017-07-18t20:44:24.823z ",
    " Build_snapshot ": false,
    " lucene_version ":" 6.6.0 "
  },
  " tagline ":" You Know, for Search "
}

In the above code, requesting port 9200, Elastic returns a JSON object that contains information such as the current node, cluster, version, and so on.
Pressing Ctrl + c,elastic will stop the operation.
By default, Elastic only allows native access, and if remote access is required, you can modify the Elastic installation directory config/ Elasticsearch.yml file, remove the Network.host comment, change its value to 0.0.0.0, and then restart the Elastic.

network.host:0.0.0.0

The above code, set to 0.0.0.0 so that anyone can access. Online services do not set this, to be set to a specific IP.

Ii. Basic Concepts
2.1 Node and Cluster
Elastic is essentially a distributed database that allows multiple servers to work together, and each server can run multiple Elastic instances.
A single Elastic instance is called a node. A set of nodes forms a cluster (cluster).

2.2 Index
Elastic will index all fields and write a reverse index (inverted index) after processing. When looking for data, look directly at the index.
Therefore, the top-level unit of Elastic data management is called Index (index). It is a synonym for a single database. Each Index (that is, the database) must have a lowercase name.
The following command can view all the Index of the current node.

$ curl-x GET ' http://localhost:9200/_cat/indices?v '

2.3 Document
A single record in Index is called document. Many of the Document forms an Index.
Document is represented in JSON format, and here is an example.

{
  "user": "Zhang San",
  "title": "Engineer",
  "desc": "Database Management"
}

The Document within the same Index does not require the same structure (scheme), but it is best to keep the same, which helps to improve search efficiency.

2.4 Type
Document can be grouped, such as weather, which can be grouped by city (Beijing and Shanghai) or grouped by climate (sunny and rainy days). This grouping is called Type, which is a virtual logical grouping used to filter the Document.
Different types should have a similar structure (schema), for example, the ID field cannot be a string in this group, and the other group is numeric. This is a difference from the table of the relational database. Data that is completely different (such as products and logs) should be stored in two index instead of the two Type within the index (although it can be done).
The following command lists the Type that each Index contains.

$ Curl ' localhost:9200/_mapping?pretty=true '

According to the plan, Elastic 6.x only allows each Index to contain one version of type,7.x and will remove the Type completely.

Iii. New and deleted Index
Create a new Index to issue a PUT request directly to the Elastic server. The following example creates a new Index named weather.

$ curl-x PUT ' Localhost:9200/weather '

The server returns a JSON object with the acknowledged field indicating that the operation was successful.

{
  "acknowledged": true,
  "shards_acknowledged": True
}

We then issue a delete request to delete this Index.

$ curl-x DELETE ' Localhost:9200/weather '

Four, Chinese word set
First, install the Chinese word breaker plugin. IK is used here, and other plugins (such as SMARTCN) can be considered.

$./bin/elasticsearch-plugin Install HTTPS://GITHUB.COM/MEDCL/ELASTICSEARCH-ANALYSIS-IK/RELEASES/DOWNLOAD/V5.5.1/ Elasticsearch-analysis-ik-5.5.1.zip

The above code installs the 5.5.1 version of the plugin, used in conjunction with the Elastic 5.5.1.
Then, restarting the Elastic will automatically load the newly installed plugin.
Then, create a new Index that specifies the field that needs the word breaker. This step varies according to the data structure, and the following commands are for this article only. Basically, all the Chinese characters that need to be searched are set up separately.

$ curl-x PUT ' localhost:9200/accounts '-d '
{
  "mappings": {"person": {"
      Properties": {
        "user": {" c5/> "type": "Text",
          "Analyzer": "Ik_max_word",
          "Search_analyzer": "Ik_max_word"
        },
        "title": {
          "Type": "Text",
          "Analyzer": "Ik_max_word",
          "Search_analyzer": "Ik_max_word"
        },
        "desc": {
          "Type": "Text",
          "Analyzer": "Ik_max_word",
          "Search_analyzer": "Ik_max_word"
        }
    }
  }}
}'

In the above code, you first create a new Index named accounts, which has a Type called person. Person has three fields.

User
title
desc

All three fields are in Chinese, and the type is text, so you need to specify a Chinese word breaker, and you cannot use the default English word breaker.
Elastic's word breaker is called Analyzer. We specify a word breaker for each field.

"User": {
  "type": "Text",
  "Analyzer": "Ik_max_word",
  "Search_analyzer": "Ik_max_word"
}

In the above code, Analyzer is the word breaker for the field text, and Search_analyzer is the word breaker for the search term. The Ik_max_word word breaker is a plug-in IK that provides the maximum number of words that can be used for text.

V. Operation of Data
5.1 New records
A PUT request is sent to the specified/index/type and a new record can be added to Index. For example, to send a request to/accounts/person, you can add a person record.

$ curl-x PUT ' LOCALHOST:9200/ACCOUNTS/PERSON/1 '-d '
{
  "user": "Zhang San", "
  title": "Engineer",
  "desc": "Database Management" c8/>} '

The JSON object returned by the server gives information such as Index, Type, Id, Version, and so on.

{
  "_index": "Accounts",
  "_type": "Person", "
  _id": "1",
  "_version": 1,
  "result": "Created",
  "_shards": {"Total": 2, "successful": 1, "Failed": 0},
  "created": True
}

If you look closely, you will find that the request path is/ACCOUNTS/PERSON/1, and the last 1 is the Id of the record. It is not necessarily a number, any string (such as ABC) can be.
When you add a record, you can also change the POST request by not specifying the ID.

$ curl-x POST ' Localhost:9200/accounts/person '-d '
{
  "user": "John Doe", "
  title": "Engineer",
  "desc": "System Administration"
}'

In the above code, a POST request is made to/accounts/person to add a record. At this point, the _id field inside the JSON object returned by the server is a random string.

{
  "_index": "Accounts",
  "_type": "Person",
  "_id": "av3qgfrc6jmbsbxb6k1p",
  "_version": 1,
  " Result ":" Created ","
  _shards ": {" Total ": 2," successful ": 1," Failed ": 0},
  " created ": True
}

Note that if you do not first create INDEX (this example is accounts), directly execute the above command, Elastic will not error, but directly generate the specified index. So, be careful when typing, don't write wrong Index name.

5.2 Viewing Records
You can view this record by issuing a GET request to/index/type/id.

$ Curl ' localhost:9200/accounts/person/1?pretty=true '

The above code requests to view the/ACCOUNTS/PERSON/1 record, and the URL parameter pretty=true is returned in an easy-to-read format.
The found field in the returned data indicates that the query was successful and the _source field returned the original record.

{
  "_index": "Accounts",
  "_type": "Person", "
  _id": "1",
  "_version": 1,
  "found": True,
  "_ Source ": {
    " user ":" Zhang San ",
    " title ":" Engineer ",
    " desc ":" Database Management "
  }
}

If the Id is incorrect, the data is not found, and the found field is false.

$ Curl ' localhost:9200/weather/beijing/abc?pretty=true '

{
  "_index": "Accounts",
  "_type": "Person",
  "_id": "abc",
  "found": false
}

5.3 Deleting records
Deleting a record is making a delete request.

$ curl-x DELETE ' LOCALHOST:9200/ACCOUNTS/PERSON/1 '

Do not delete this record here, but also use it later.

5.4 Update history
Updating a record is the use of a PUT request to resend the data.

$ curl-x PUT ' LOCALHOST:9200/ACCOUNTS/PERSON/1 '-d '
{
    "user": "Zhang San", "
    title": "Engineer",
    "desc": "Database management, software Development "
} ' 

{
  " _index ":" Accounts ",
  " _type ":" Person ",
  " _id ":" 1 ",
  " _version ": 2,
  " result " : "Updated",
  "_shards": {"Total": 2, "successful": 1, "Failed": 0},
  "created": false
}

In the above code, we changed the original data from "Database management" to "database management, software development". There are several fields that have changed in the returned results.

"_version": 2,
"result": "Updated",
"created": false

As you can see, the Id of the record does not change, but the version is changed from 1 to 2, and the operation type (result) changes from created to updated,created field to false because this is not a new record.

Six, data query
6.1 Return all records
By using the GET method, the/index/type/_search is directly requested and all records are returned.

$ Curl ' Localhost:9200/accounts/person/_search '

{
  "took": 2,
  "Timed_out": false,
  "_shards": {"total ": 5," successful ": 5," failed ": 0},
  " hits ": {
    " total ": 2,
    " Max_score ": 1.0,
    " hits ": [
      {
        " _ Index ":" Accounts ","
        _type ":" Person ",
        " _id ":" av3qgfrc6jmbsbxb6k1p ",
        " _score ": 1.0,
        " _source " : {
          "user": "John Doe",
          "title": "Engineer",
          "desc": "System Management"
        }
      },
      {
        "_index": "Accounts",
        "_type": "Person",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "user": "Zhang San",
          "title": " Engineer ",
          " desc ":" Database management, Software Development "}}
      ]}
}

In the above code, the took field that returns the result indicates the time-consuming (in milliseconds) of the operation, the Timed_out field indicates whether the time-out, the Hits field represents the hit record, and the meaning of the sub-field is as follows.

Total: Returns the number of records, this example is 2.
Max_score: The highest degree of matching, this example is 1.0.
hits: An array of the returned records.

In the returned records, each record has a _score field that represents the matching program, which by default is sorted in descending order of this field.

6.2 Full Text Search
Elastic's query is very special, using its own query syntax, requiring a GET request with the data body.

$ Curl ' localhost:9200/accounts/person/_search '-  d '
{
  "query": {"match": {"desc": "Software"}}
'

The above code uses the match query, and the specified match condition is that the DESC field contains the word "software". The returned results are as follows.

{
  "took": 3,
  "Timed_out": false,
  "_shards": {"Total": 5, "successful": 5, "failed": 0},
  "hits": {
    " Total ": 1,
    " Max_score ": 0.28582606,
    " hits ": [
      {
        ' _index ': ' Accounts ',
        ' _type ': ' Person ',
        "_id": "1", "
        _score": 0.28582606,
        "_source": {
          "user": "Zhang San", "
          title": "Engineer",
          "desc": "Database management, Software Development"}}
      ]}
}

Elastic returns 10 results at a time, this setting can be changed by the size field.

$ Curl ' localhost:9200/accounts/person/_search '-  d '
{
  "query": {"match": {"desc": "Managed"}},
  "size ": 1
} '

The above code specifies that only one result is returned at a time.
You can also specify displacements through the From field.

$ Curl ' localhost:9200/accounts/person/_search '-  d '
{
  "query": {"match": {"desc": "Management"}}, "From
  ": 1 ,
  "size": 1
} '

The above code specifies that starting at position 1 (the default is starting at position 0), only one result is returned.

6.3 Logical Operations
If there are multiple search keywords, Elastic thinks they are or relationships.

$ Curl ' localhost:9200/accounts/person/_search '-  d '
{
  "query": {"match": {"desc": "Software System"}}
} '

The code above searches for a software or system.
If you want to perform an and search for multiple keywords, you must use a Boolean query.

$ Curl ' localhost:9200/accounts/person/_search '-  d '
{
  "query": {
    "bool": {
      "must": [
        { "Match": {"desc": "Software"}},
        {"Match": {"desc": "System"}}
      ]}}
'

Seven, reference links

ElasticSearch Official Brochure
A Practical Introduction to Elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.