Elasticserach learning notes-01 basics, elasticserach-01

Source: Internet
Author: User

Elasticserach learning notes-01 basics, elasticserach-01

ElasticSearch has several core concepts that help us throughout our learning process.

Near real-time (NRT)
Elasticsearch is a near real-time search platform. This means that there is only a small time difference (usually 1 second) from when you create an index for a document to when the document can be retrieved ).

Cluster)
A cluster is a collection of one or more nodes (servers) that store the entire data together and provide the Union Index and search functions on all nodes. A cluster is identified by a unique name. The default value is "elasticsearch ". This name is very important because only one node (server) can be added to the cluster by the unique name of the cluster.

Make sure that you do not use the same cluster name in different environments. Otherwise, the node may be added to the wrong cluster. For example, you can use logging-dev, logging-stage, and logging-prod as the cluster names for the development, test, and production environments respectively.

A cluster can contain only one node. You can also set multiple different clusters. Each cluster has its own unique name.

Node)
A node is a server of a cluster that stores data and participates in the indexing and searching functions of the cluster. Similar to a cluster, a node is identified by a name. When a node is started, it is assigned a random UUID (Universally Unique Identifier) by default ). If you do not want the default name, you can define a name for any node. This name plays an important role in managing the node corresponding to the server in your network.

A node can be configured to join a cluster with a specific name. By default, each node is added to a cluster named elasticsearch. That is to say, if you start multiple nodes and assume they are connected through the network, they will automatically group a cluster named elasticsearch.

In a single cluster, you can have any number of nodes. If no other ElasticSearch node is running on your network, start a new node and create a new single-node cluster named elasticsearch.

Index)

An index is a collection of documents with similar features. For example, you can create an index for the customer data, and create another index for the product directory and order data. An index is identified by a name (which must all be in lowercase). It is used to reference an index when indexing, searching, updating, or deleting a document.

In a cluster, you can define any number of indexes.

Type)
In an index, you can define one or more types. The type is the logical category/partition of the index, and its semantics is entirely dependent on you. Generally, a type is defined as a document with a public field set. For example, suppose you operate a blog platform and store all the data in one index. In this index, you can define the types of user data, blog data, and comment data.

Document)
A document is the smallest unit that can be indexed. For example, you can create a document for a customer, create another document for a product, and create another document for an order. This document is represented in JSON (JavaScript Object Markup Language). It is a ubiquitous Internet Data exchange format.

You can store any number of documents in an index or type. Note that a document is physically stored in an index, and must be specified in an index.

Shards & Replicas)
Indexes can store a large amount of data, which may exceed the hardware limit of a single node. For example, a single index that occupies 1 TB of disk space and contains one billion files may not be suitable for a single node disk. Because it will slow down the service itself and cannot respond to external search requests.

To solve this problem, Elasticsearch provides the ability to divide an index into multiple blocks (called fragments. When creating an index, you can simply define the number of shards you need. Each Shard is a full-featured, independent "Index" that can be hosted on any node in the cluster.

The important reason for the sharding is that:
1. It allows you to horizontally split/scale the Content Volume
2. It allows you to perform parallel operations across shards (possibly on multiple nodes) to improve performance/throughput.

The sharding distribution mechanism, and the retrieval requests returned after the sharding documents are aggregated are completely managed by Elasticsearch, which is completely transparent to you as users.

When a failure occurs in a network or cloud computing environment and the partition or node is accidentally unavailable, the Failover mechanism is very helpful and highly praised. For this reason, Elasticsearch can convert one or more copies to replicas ).

There are two main reasons for the importance of replicas:
1. It improves the availability of slice/node faults. It should be noted that the replica shard will never be allocated to the same node as the original/Master shard copied by it. (Note: if it is in the same node as the primary Shard, when the node fails, the replica node will also be unavailable, and the meaning of the replica will be lost)
2. It increases the search volume/throughput because the search can be executed concurrently on all copies.

Summary: Each index can be split into Multiple shards. An index can be copied 0 times (not copied) or multiple times. Once copied, each index will have the primary shard (original replicated shard) and replica shard (the shard copied from the primary shard ). You can specify the number of primary and replica shards when creating an index. After an index is created, you can dynamically change the number of replica shards, but the number of Primary shards cannot be modified.

Each index in each Elasticsearch is assigned five primary shards and one replica (node) by default. That is to say, if there are two nodes in your cluster, then, your index will have five primary shards and five replica shards (one full replica) with a total of 10 shards.

Each Elasticsearch Shard is a Lucene index. Each Lucene index has a maximum number of allowed documents. Deadline
LUCENE-5843, up to 2,147,483,519 (= Integer. MAX_VALUE-128 ). You can use _ cat/shards
Api to monitor the part size.

After learning about the basic concepts, let's get started with interesting parts...

 

 

 

According to the translation of official documents, this article is a great honor for me if I want to learn Elasticsearch.

Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.