1.NRT (near real-time search)
Elasticsearch is a NRT platform. This means that when you index a file, the file can be searched after a slight delay (usually 1s).
2.Cluster (cluster)
Cluster is a collection of one or more nodes (servers) that hold all the data in all nodes and provide federated indexing and search capabilities. Each cluster has a unique name, which defaults to "Elasticsearch". This name is important because if the node joins cluster through the cluster name, the node can only be part of the cluster.
Make sure that the same cluster name is not reused in different environments unless the node joins the wrong cluster. For example, use Logging-dev, Logging-stage, and Logging-prod for development, stage, and production clusters respectively.
Note that cluster can only configure one node. And, you may also have multiple independent clusters, each cluster has its own unique name.
3.Node (node)
A node is a server that is part of the cluster that stores data and participates in cluster's indexing and querying capabilities. Like cluster, each node has its own name, which, by default, is assigned to a random person at startup. If you do not want the default name, you can customize the name. This name is useful for cluster management, especially if you want to confirm which server corresponds to which node.
A node can configure the specified cluster name to join the cluster. By default, each node is joined by a cluster named "Elasticsearch".
A cluster can contain any number of node. Also, if no other node is running, launching a node will default to a single-node cluster named "Elasticsearch".
4.Index (Index)
Index is a collection of documents with similar characteristics. For example, you can have the index of the customer data, the index of the product type, and the index of the order data. Each index has a unique name (must be lowercase), and the index, search, update, and delete operations need to be indexed according to that name.
A cluster can contain any number of index.
5.Type (type)
in an index, you can define one or more type. Type is the logical Classification of index. Typically, type has a collection of documents for a common set of field. For example, you run a blogging platform that stores all the data in an index. In this index, you can define a type for the user data, define a type for the blog data, and define a type for the comment data.
6.Document (documentation)
Document is the basic unit information for an index. For example, you have a client document, a document for a commodity, and a document for an order. The document is expressed in JSON.
a index/type can contain any number of document. Note that although document is physically present in index , the document must actually be assigned to the type in index.
7.Shards & Replicas (shards and replicas)
An index can store data that exceeds the hardware limit. For example, an index containing 1 billion documents that occupies 1TB of hard disk space can result in a failure to place on a single node hard disk or respond to request too slowly.
To solve this problem, Elasticsearch provides a technique for indexing shards, called Shard. When you create an index, you can specify the number of Shard. Each shard interior is fully functional and independent of the "index", which can be placed on any node on the cluster.
Sharding important reasons for this:
- Allow scale-out capacity
- Distributed parallel Operation Shard (possibly on multiple node), resulting in improved performance
The mechanism by which shards are distributed and how documents are aggregated back into search results is managed entirely by elasticsearch, transparent to the user.
In a network/cloud environment, faults are common, such as a shard/node outage, and therefore the fault-tolerant mechanism is very effective and recommended. To do this, Elasticsearch allows you to create one or more copies of the index's Shard, called Replica.
Replication important reasons for this:
- Provides high availability in case of shard/node failure. Note that a replica shard must never be assigned to its original node (that is, the original node where the replica was created).
- Allow extended search capacity because the search can be performed in parallel on all replica
To summarize, each index can be divided into multiple shard. An index can also be duplicated in 0 or more copies. Once copied, each index will have a primary shard (the original shard that created the replica) and replica shard (a copy of the primary shard). The number of Shard and replica can be customized at index creation time. After index is created, you can dynamically change the number of replica, but you cannot change the number of Shard.
By default, each index is assigned 5 primary shard and one replica, which means that if you have at least two nodes in cluster, each index will have 10 Shard, 5 main shard and 5 replica Shard (a complete replica).
Elasticsearch Basic Concepts