First, the basic idea of the server deployment algorithm
1, add 1-2 servers, for the load Balancing node
Elasticsearch configuration file has 2 parameters: Node.master and Node.data. These two parameters, when used in combination, can help provide server performance.
1.1> node.master:false node.data:true
This node server is used only as a data node for storing index data only. Make the node server function single, only for data storage and data query, reduce its resource consumption rate.
1.2> node.master:true Node.data:false
The node server acts as only one master, but does not store any index data. The node server will use its own idle resources to coordinate the creation of index requests or query requests that are reasonably distributed to the relevant node servers.
1.3> node.master:false Node.data:false
The node server is not selected as the master node and no index data is stored. This server is used primarily for query load balancing. When querying, it usually involves querying data from multiple node servers and requesting distribution to multiple specified node servers and a summary of the results returned by each node server, which is eventually returned to the client.
2, turn off HTTP functionality in the Data node server
for all data nodes in the Elasticsearch cluster, do not need to turn on the HTTP service. Set the configuration parameters in this way: Http.enabled:false, and do not install the head, Bigdesk, Marvel and other monitoring plug-ins, so that the Data node server can only handle the creation/update/delete/query index data operations.
HTTP features can be opened on a non-data node server, and the associated monitoring Plug-ins are also installed on these servers to monitor data information such as Elasticsearch cluster status.
This is done for data security reasons, and secondly for service considerations.
3, it is best to deploy only one node on a server
a physical server can start multiple node server nodes (by setting different boot port), but a server on the CPU, memory, hard disk and other resources are limited, from the server performance considerations, do not recommend a Multiple node nodes are started on the server.
Second, server configuration
1, configure the size of the index thread pool
The Elastisearch server has multiple thread pool size configurations. Mainly include: Index,search,suggest, Get,bulk,percolate,snapshot,snapshot_data,warmer,refresh.
This is mainly for index and search to make a configuration adjustment. The index action contains: Create/update/delete indexed data. The search operation is primarily targeted at the user's various searches.
The specific configuration is as follows:
ThreadPool:
index:
type:fixed
size:100
search:
type:fixed
size:1000
2, create/Find index set the same participle parser
The index server uses the IK Chinese word breaker, which uses the Chinese participle for the data added to the search server (for example, the OrgName in the Orgglobal object uses IK Chinese participle). When performing search requests, search keywords also need to use the relevant Chinese word breaker, if not specified settings, will use the server default Chinese word standard, and use standard as a Chinese word breaker for query, performance is not good. The query efficiency is 2-3 times the standard by setting the word breaker in IK to the default word breaker.
This configuration is specific as follows:
Index: Analysis
:
Analyzer:
ik:
alias: [News_analyzer_ik,ik_analyzer]
type: Org.elasticsearch.index.analysis.IkAnalyzerProvider
Index.analysis.analyzer.default.type:ik
3, determine the number of fragments (shard) and the number of replicas (replica)
Elasticsearch when creating index data, it is a good idea to specify the number of shards and replicas that are relevant,
Otherwise, the default configuration parameter Shards=5,replicas=1 in the server is used. The
because the settings of these two properties directly affect the execution of indexes and search operations in the cluster. Suppose you have enough
machines to hold the fragments and copies, you can set these two values as follows:
1) Having more fragmentation improves indexing execution and allows a large index to be distributed through machines, and
2 has more replicas that improve search execution and clustering capabilities.
for an index, number_of_shards can only be set once, and Number_of_replicas may use the index Update settings API to be incremented or reduced at any time.
These configuration parameters are configured in the configuration file as follows:
Index.number_of_shards:5
index.number_of_shards:1
4, slow query log configuration
in practical applications, records A slow query or an operation record that adds a slow index, providing a basis for subsequent
continuous performance optimizations. The specific configuration is as follows:
index.search.slowlog.threshold.query.warn:10s
index.search.slowlog.threshold.query.info:5s
Index.search.slowlog.threshold.query.debug:2s
index.search.slowlog.threshold.query.trace:500ms
Index.search.slowlog.threshold.fetch.warn:1s
index.search.slowlog.threshold.fetch.info:800ms
index.search.slowlog.threshold.fetch.debug:500ms
index.search.slowlog.threshold.fetch.trace:200ms
index.indexing.slowlog.threshold.index.warn:10s
index.indexing.slowlog.threshold.index.info:5s
Index.indexing.slowlog.threshold.index.debug:2s
index.indexing.slowlog.threshold.index.trace:500ms
III. Optimization of data structure
1. Minimize unwanted fields
The data stored in Elasticsearch is for the search service, so other fields that do not need to be searched are best left out of the ES, which saves space and improves search performance at the same amount of data.
2, the setting of routing value
Typically, when you add index data to a Elasticsearch server, you do not need to specify a routing value. Elasticsearch stores the data into a shard in the Elasticsearch cluster based on the index ID. When the routing value is specified as AccountId (user ID), then Elasticsearch will store multiple data of the same accountid in the same shard, and subsequent queries, after specifying the routing value, Elasticsearch only need to query a shard to get all the required data without having to query all the Shard, thus providing a great search performance.
Optimization of operation period
1, optimize
Over time, there will be more and more data for each shard in the Elasicsearch, and more and more indexes will become larger, and the resulting segment (each shard, which is actually composed of multiple sgment files) is becoming more and more. The more segment, the worse the query performance, so you can improve query performance by calling the Optimize command to combine multiple segment into a smaller number of segment (at least one).
When you invoke the command, you can set several parameters that are specific to the following meanings:
1.1> max_num_segments
Segment number optimization. To fully optimize the index, set it to 1. The default setting simply checks whether a merge needs to be performed and executes it if necessary. "After testing, the smaller the value, the faster the query speed"
1.2> only_expunge_deletes
Whether the optimization operation empties only the index record with the deleted label. In Lucence, when a delete operation is performed, the record in the segment is not deleted directly, but the delete label is made to the record. When multiple segment perform a merge operation, a new segment is generated, and the new segment no longer contains deleted records. This parameter allows optimizations to be performed on only those segment that contain deleted records.
1.3>flush
After the optimization operation is performed, the refresh operation is performed. The default value is True
1.4>wait_for_merge
When this parameter is set to True, the other request operation is to wait until the merge segment operation has finished before responding. It is noteworthy that because this optimization operation is a very time-consuming, resource-consuming thing, the user submits the request operation is not tolerable wait so long, so this parameter is best set to False.
The specific invocation commands are as follows:
Http://localhost:9200/indexName/_optimize?only_expunge_deletes=true&wait_for_merge=false
2, Warmers
When the Elasticsearch server is started, the index data to be used in the business system is not imported into memory for the time being, so when the user makes the first data search, the data import is time-consuming and severely affects the user experience. To resolve this problem, you can use the warmer tool. The tools provided by Elastisearch can be used to register/delete/get the warmer of a particular name. Typically, a warmer contains a request to load a large amount of index data (for example, a sort operation for a particular field in a data search, or a query that uses some aggregate Sum,min,max functions) to achieve a warm-up effect.
The specific invocation example is as follows (the following warmer is defined for Warmer,warmer with the name "test" as the index name is warmer_1):
Curl-xput localhost:9200/test/_warmer/warmer_1-d ' {'
query ': {
' Match_all ': {}
},
' Aggs ': {
' Aggs_1 ': {' terms ':
{
' field ': ' Field '}}}
'
Four, pay attention to modify
discovery.zen.fd.ping_timeout:120s
discovery.zen.fd.ping_retries:6
discovery.zen.fd.ping_interval: 30s
index.cache.field.max_size:50000
index.cache.field.expire:10m
index.cache.field.type:soft
export es_heap_size=10g
or
./bin/elasticsearch-xmx10g-xms10g
Elasticsearch- des.index.refresh_interval=10s
bootstrap.mlockall:true
vm.swappiness = 1
sudo swapoff-a