Getting started with Elasticsearch, elasticsearch

Source: Internet
Author: User

Getting started with Elasticsearch, elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analysis engine. It can store, search, and analyze large-scale data quickly and in near real time. It is generally used as the underlying engine/technology to provide powerful support for applications with complex search functions and requirements.

Elasticsearch can be used in these places:

The rest of this tutorial will guide you through the startup and running process of Elasticsearch and show some basic operations, such as indexing, searching, and modifying data. After this tutorial, you will have a deep understanding of what Elasticsearch is and how it works. I hope you will be inspired to use it to build complex search applications and discover useful things from your data.

Official documents: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Chinese community: http://elasticsearch.cn/article/

Reference: https://github.com/13428282016/elasticsearch-CN/wiki/es-gettting-started

Elasticsearch version: 5.4

Basic Concepts)

Some concepts are at the core of Elasticsearch. Understanding these concepts from the very beginning will greatly help you to learn in the future.

Near real-time (NRT)

Elasticsearch is a near real-time search platform. This means that there is only a slight latency (usually 1 second) from the time the document is indexed to the time it becomes searchable ).

Cluster)

A cluster is a collection of one or more nodes (servers). They are combined to save all the data and can be indexed and searched on all nodes. The cluster is identified by a unique name. The default value is "elasticsearch ". Because a node can only belong to one cluster, it is added to the cluster according to the cluster name. Therefore, this name is very important.

Do not use the same cluster name in different environments. Otherwise, an incorrect cluster may be added. For example, you can use the cluster name, logging-dev, logging-stage, and logging-prod in the development, transition, and production environments respectively.

Note that clusters with only one node are effective and perfect. You can also have multiple independent clusters. Each cluster has its own unique cluster name.

Node)

A node is a single server and a part of a cluster. It stores data and participates in cluster indexing and searching. Like a cluster, a node is also distinguished by a Unique name. The default name is a random UUID (Universally Unique IDentifier), which is set when the server is started. If you do not want to use the default value, you can also customize the node name. The name is very important to the Administrator. It helps you differentiate the servers in the cluster and the nodes that correspond to each other.

Nodes can be added to the specified cluster by configuring the cluster name. By default, all nodes are added to a cluster named elasticsearch. This means that if you start a large number of nodes in the network and if they can communicate with each other, then they will be automatically added to a cluster named elasticsearch.

Index)

An index is a collection of documents with similar features. For example, customer data index, product Directory Index, and order data index. An index is identified by a name (which must all be in lowercase). This name is used to index, search, update, and delete a document. In a single cluster, you can define any number of indexes as needed.

Type)

One index can define one or more types. The type is the logical category/partition of the index. Generally, a type is defined for a document with a set of public fields. For example, a blog platform stores all data in a single index. In this index, you can define the user data type, blog data type, and comment data type.

Document)

A document is the basic unit that can be indexed. For example, you can use a document to save data of a customer, data of a single product, or data of a single order. The document is in JSON format. You can store a large number of documents in the index/type. It is worth noting that, although the document is essentially stored in the index, it is actually a type that is indexed/allocated to the index.

Shards and replicas)

An index may store massive data and may exceed the hard disk capacity of a single node. For example, if an index stores 1 billion documents and occupies 1 TB of hard disk space, the hard disk of a single node may not be enough to store that large amount of data, however, it may reduce the speed at which the server processes search requests.

To solve this problem, elasticsearch Provides the sharding function to segment indexes. When creating an index, you can simply define the number of required parts. Each Shard has all the indexing functions and can be stored on any node in the cluster.

Slice is very important for two reasons:

  • It allows you to horizontally split/scale your internal capacity
  • It allows you to distribute operations to shards on multiple nodes in parallel to improve performance or throughput.

The fragment distribution mechanism and how its documents are summarized and returned to search requests are fully managed by Elasticsearch and transparent to users.

In a network or cloud environment, a fault may occur at any time, and the slice is very useful. We strongly recommend that you use the Failover mechanism to prevent the slice/node from going offline or disappearing. Therefore, elasticsearch You can copy one or more parts of the index, that is, the so-called copy part, or abbreviated as a copy.

Replicas are important for two reasons:

  • If a shard or node fails, high availability is available. Therefore, you must note that the copy and its original/primary shard cannot be allocated to the same node.
  • It allows you to expand the search volume/throughput because you can perform searches on all copies in parallel.

All in all, each index can be divided into Multiple shards. Each index can also be replicated zero times (meaning no copy) or multiple times. Once copied, each index will have the primary shard (the original shard copied) and secondary shard (the replica of the primary shard ). You can define the number of shards and copies based on the index when creating an index. After creating an index, you can change the number of replicas dynamically at any time, but you cannot change the number of parts afterwards.

By default, each index is allocated with five primary shards and one replica shard. This means that if your cluster has two nodes, your index will have 5 master shards and 5 Copy shards, with a total of 10 shards.

Each elasticsearch Shard is a Lucene index, a Lucene index can have a lot of documents, as of LUCENE-5843, up to 2,147,483,519 (= Integer. MAX_VALUE-128) documents. you can use the _ cat/shards api to monitor the part size.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.