Source: Https://goo.gl/T01ITO Basic Concepts:
There are many core concepts in Elasticsearch, and mastering these concepts will be of great help to Elasticsearch's learning. Near Realtime (NRT)
Elasticsearch is a near-implementation search platform. It means that he will have a slight latent (usually 1 seconds) delay, from which the data is indexed to be searchable. Cluster
Cluster is a collection of one or more node (server). Cluster has all the data and provides a federated index and query for all nodes.
A cluster has a unique name flag, which by default is "Elasticsearch".
Make sure that you do not use the same clustername in different environments, or node will go into the wrong cluster. For example: We can use the following naming methods: Logging-dev, Logging-stage, and Logging-prod represent the development environment, demonstration environment and production environment respectively.
Note: Having a node under a cluster is a perfect and effective way. It may therefore be necessary to have a lot of independent cluster. Node
A node is a standalone server and is part of the cluster. Store data, participate in cluster indexing and search. A node has a unique name flag that, when activated, assigns a random UUID as the default node name.
Node name is important when we need to flag which computer in the network corresponds to which node in cluster.
We can specify node through cluster Name to enter a specific cluster. By default, node is entered into the "Elasticsearch" cluster. That is, if we start a lot of node, by default the node will go into the "Elasticsearch" cluster.
In a cluster, we can have a lot of node. Index
Index is a collection of a series of document. Let's say we can have an index called customer data, another called the Product catalog, and another called order data. An index is marked by a unique lowercase letter name. This name will affect the data inside the query, update operation.
He's quite with the DB type inside the relational database
In an index, we can define a number of type. A type is a logical grouping. In most cases, the data in the same type will have the same data structure.
He is the equivalent of a table in a relational database. Document
A document is a basic unit that can be indexed. Usually document is data in JSON form.
Note: The document is usually physically stored in index, but it must be assigned to the type. Shard & Replica
Index can store a large amount of data that can exceed the hardware limits of a single node. For example, a single index on 1 billion data that can occupy 1TB of hard disk space may not be suitable for a node on disk, or it may make the search slow.
To solve this problem, Elasticsearch provides the ability to subdivide index to multiple slices, which are called Shard. Each time we create an index, we can define the number of Shard for this index.
Each shard itself is a full-featured and standalone "index" that can be hosted on any node in the cluster.
The importance of Shard:
(1) Allow us to slice content horizontally;
(2) allows us to increase system throughput through distributed and parallel cross-shard operations.
Shard is transparent to the user when it is allocated and when the data is aggregated for the query.
In a cloud environment, the system at all times has a risk of failure, Elasticsearch allows us to do one or more shard to the backup, called replica shards, referred to as replicas.
The importance of replicas is reflected in two areas:
(1) Increase the high availability of the cluster. So replicas not and shard are stored on the same server at the same time.
(2) allows us to expand the search volume because the search can be performed simultaneously on all replicas.
In conclusion, each index can be divided into many shards. An index can have 0 or more copies. Once copied, index has primary shards and replica shards. We can specify the number of shards and replicas at index creation time. The number of replicas can be modified at any time, but the number of shards cannot be modified after creation.
By default, each index in Elasticsearch is assigned 5 shards and one replica (each primary shard has one replication shard).
So if there are two nodes in a cluster, 1 index has 10 shards by default.