ElasticSearch-Basic Concepts

Source: Internet
Author: User

ElasticSearch-Basic Concepts

For articles translated by others, it is very important to master the following basic concepts for learning Elasticsearch. You can try to align the following concepts with MySQL (databases, tables, data rows, fields.

Basic Concepts

Elasticsearch has several core concepts. Understanding these concepts from the very beginning will be of great help to the entire learning process.

Near real-time (NRT)

Elasticsearch is a near real-time search platform. This means that there is a slight latency (usually 1 second) from indexing a document until this document can be searched ).

Cluster)

A cluster is composed of one or more nodes that jointly hold your entire data and provide indexing and searching functions. A cluster is identified by a unique name. The default name is "elasticsearch ".

This name is important because a node can only be added to a cluster by specifying the name of a cluster. It is a good habit to explicitly set this name in the product environment, but using the default value for testing/development is also good.

Node)

A node is a server in your cluster. As a part of the cluster, it stores your data and participates in the indexing and searching functions of the cluster. Similar to a cluster, a node is identified by a name. By default

The name is the name of a random marvel cartoon role, which is assigned to the node at startup. This name is very important for management, because during this management process, you will determine which servers in the network correspond to which nodes in the Elasticsearch cluster.

A node can be added to a specified cluster by configuring the cluster name. By default, each node is scheduled to be added to a cluster called "elasticsearch". This means that if you start several nodes in your network, assuming that they can discover each other, they will automatically form and be added to a cluster called "elasticsearch. In a cluster, you can have any number of nodes as long as you want. In addition, if no Elasticsearch node is running in your network, a node is started, and a cluster named "elasticsearch" is created and added by default.

Index)

An index is a collection of documents with similar features. For example, you can have an index for customer data, an index for another product directory, and an index for order data. An index is identified by a name.

(All must be lowercase letters), and this name is used when you want to index, search, update, and delete documents corresponding to this index. In a cluster, you can define as many indexes as you want.

Type)

In an index, you can define one or more types. A type is a logical classification/partition of your index. Its semantics is entirely determined by you. Generally, a type is defined for a document with a group of common fields. For example, we assume that you operate a blog platform and store all your data in an index. In this index, you can define a type for user data, another type for blog data, or another type for comment data.

Document)

A document is a basic unit of information that can be indexed. For example, you can have a document of a customer,

A document of a product, of course, can also have a document of a specific order. The document is represented in JSON (JavascriptObject Notation) format, while JSON is a ubiquitous Internet Data Interaction format.

In an index/type, you can store any number of documents as long as you want. Note: although a document physically exists in an index, the document must be indexed/assigned the type of an index.

Shards & replicas)

An index can store a large amount of data that exceeds the hardware limit of a single node. For example, an index with 1 billion documents occupies 1 TB of disk space, and no node has such a large disk space; or a single node processes search requests, the response is too slow.

To solve this problem, Elasticsearch provides the ability to divide indexes into multiple copies, which are called shards. When creating an index, you can specify the number of shards you want. Each part is also

A fully functional and independent "Index", which can be placed on any node in the cluster.

There are two main reasons for the importance of sharding:

-Allow you to horizontally split/expand your content capacity

-Allows you to perform distributed and parallel operations on shards (potentially on multiple nodes) to improve performance/throughput.

As for how a shard is distributed, it is completely managed by Elasticsearch to aggregate and return search requests. These are transparent for users.

In a network/cloud environment, failures may occur at any time. When a shard/node is offline or disappears for any reason, A failover mechanism is very useful and

Is strongly recommended. For this purpose, Elasticsearch allows you to create one or more copies of a shard. These copies are called copies or directly copies.

There are two main reasons why replication is important:

-High availability is provided when parts/nodes fail. For this reason, it is very important to note that the copied parts are never placed on the same node as the original/primary (original/primary) parts.

-Expand your search volume/throughput because the search can run concurrently on all copies.

In short, each index can be divided into Multiple shards. An index can also be replicated 0 times (meaning no replication) or multiple times. Once copied, each index has a primary shard (as the original shard of the replication source) and a copy shard (copy of the primary shard. You can specify the number of shards and copies when creating an index. After an index is created, you can dynamically change the number of copies at any time, but you cannot change the number of shards afterwards.

By default, each index in Elasticsearch is divided into five primary shards and one replica. This means that if your cluster has at least two nodes, your index will have five primary shards and five other replica shards (one full copy), so that each index has a total of 10 shards.

Full record of installation and deployment of ElasticSearch on Linux

Elasticsearch installation and usage tutorial

ElasticSearch configuration file Translation

ElasticSearch cluster creation instance

Build a standalone and server environment for distributed search ElasticSearch

Working Mechanism of ElasticSearch

ElasticSearch details: click here
ElasticSearch: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.