The previous log processing model for CDN was from
Logstash Agent==>>redis==>>logstash Index==>>elasticsearch==>>kibana3, For elasticsearch cluster construction, the index can be partitioned storage, an index can be divided into several slices, respectively, stored in the cluster, and for the load balancer inside the cluster, copy allocation, Index dynamic equalization (depending on the node's increase or decrease) is done internally by the Elasticsearch itself, and is redistributed as soon as it is in progress.
Here are a few nouns about elasticsearch.
Cluster
Represents a cluster, there are multiple nodes in the cluster, there is a primary node, the main node can be elected, the master-slave node for the internal cluster. One of the concepts of ES is to center, literally understand that there is no central node, this is for the outside of the cluster, because the ES cluster from the outside, in a logical whole, you communicate with any one node and the entire ES cluster communication is equivalent.
Shards
Represents the index Shard, es can divide a complete index into multiple shards, the advantage is that a large index can be split into multiple, distributed to different nodes. constitute a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created.
Replicas
Represents a copy of the index, ES can set a copy of multiple indexes, the role of a copy is to improve the system's fault tolerance, when a node a shard corruption or loss can be recovered from the replica. The second is to improve the query efficiency of ES, ES will automatically load balance the search request.
Recovery
Represents data recovery or redistribution of data, ES when a node joins or exits the index shards are redistributed based on the load of the machine, and data recovery occurs when the node is restarted.
River
Represents a data source for ES and is also a way to synchronize data to ES with other storage methods (such as databases). It is an ES service that exists as a plug-in, and by reading the data in the river and indexing it into ES, the official river has couchdb, RABBITMQ, Twitter, Wikipedia, and river This feature will be highlighted in a later document.
Gateway
Represents the persistent storage of ES indexes, es default is to store the index in memory, and then persist to the hard disk when the memory is full. When the ES cluster is shut down and restarted, the index data is read from the gateway. ES supports multiple types of gateway, with local file system (default), Distributed File System, Hadoop HDFs and Amazon's S3 cloud storage service.
Discovery.zen
Represents the automatic discovery node mechanism of ES, ES is a peer-based system that first searches for existing nodes by broadcasting, and then communicates between nodes through multicast protocols, and also supports point-to-point interactions.
Transport
Represents the way in which ES internal nodes or clusters interact with the client, and by default it interacts with the TCP protocol, and it supports transport protocols (integrated via plug-ins) for the HTTP protocol (JSON format), thrift, servlet, memcached, ZEROMQ, and so on.
Several common concepts of processing large-scale log streams in Elasticsearch clusters