1.1. Basic Configuration
Elasticsearch Config folder contains two profiles: Elasticsearch.yml and Logging.yml, the first is the basic ES profile, the second is the log configuration file, ES is also used to log the use of log4j, So the settings in the logging.yml are set according to the normal log4j configuration file. The following is a brief explanation of what is configurable in this elasticsearch.yml file.
Cluster.name:elasticsearch
Configure ES cluster name, default is Elasticsearch,es will automatically find in the same network section of the ES, if there are multiple clusters under the same network segment, you can use this attribute to distinguish between different clusters.
Node.name: "Franz Kafka"
The node name, which is randomly assigned a name in the name list, which is in the Name.txt file in the Config folder in the ES jar bundle, with many interesting names added by the author.
Node.master:true
Specifies whether the node is eligible to be elected as node, by default True,es is the first machine in the default cluster and will be returned to master if the machine hangs.
Node.data:true
Specifies whether the node stores index data, and defaults to true.
Index.number_of_shards:5
Sets the default index fragment number, which defaults to 5 slices.
Index.number_of_replicas:1
Sets the number of default index replicas, which defaults to 1 replicas.
Path.conf:/path/to/conf
Sets the storage path for the configuration file, which defaults to the Config folder in the ES root directory.
Path.data:/path/to/data
Set the storage path for index data, default is the Data folder under the ES root directory, you can set multiple storage paths, separated by commas, example:
Path.data:/path/to/data1,/path/to/data2
Path.work:/path/to/work
Sets the storage path for the temporary file, which defaults to the work folder in the ES root directory.
Path.logs:/path/to/logs
Sets the storage path for the log file, which defaults to the logs folder in the ES root directory
Path.plugins:/path/to/plugins
Set the storage path for the plug-in, default is the Plugins folder under the ES root directory
bootstrap.mlockall:true
set to True to lock memory. Because the efficiency of the ES is reduced when the JVM begins to swapping, it is possible to set the ES_MIN_MEM and ES_MAX_MEM two environment variables to the same value, and to ensure that the machine has enough memory allocated to ES. Also allow Elasticsearch process to lock memory, Linux can be through the ' ulimit-l Unlimited ' command.
network.bind_host:192.168.0.1
Sets the IP address of the binding, which can be IPv4 or IPv6, and defaults to 0.0.0.0.
network.publish_host:192.168.0.1
Set the IP address of the other node interacting with the node, if not set it automatically determines that the value must be a real IP address.
network.host:192.168.0.1
This parameter is used to set both the Bind_host and publish_host above two parameters.
transport.tcp.port:9300
sets the TCP port for interaction between nodes, which defaults to 9300.
Transport.tcp.compress:true
Sets whether to compress data for TCP transmissions by default, not compression.
http.port:9200
sets the HTTP port for external services, which defaults to 9200.
http.max_content_length:100mb
Set the maximum capacity of content, default 100MB
Http.enabled:false
Whether to use the HTTP protocol to provide services externally, default to True, open.
gateway.type:local
The type of gateway, which defaults to local file system, can be set to local file system, Distributed File System, Hadoop HDFs, and Amazon's S3 server.
Gateway.recover_after_nodes:1
Set the n nodes in the cluster to start with data recovery by default of 1.
Gateway.recover_after_time:5m
Sets the timeout for initializing the data recovery process, which defaults to 5 minutes.
Gateway.expected_nodes:2
Set the number of nodes in this cluster, the default is 2, and once the N nodes are started, data recovery is done immediately.
Cluster.routing.allocation.node_initial_primaries_recoveries:4
The number of concurrent recovery threads, which defaults to 4, when data recovery is initialized.
Cluster.routing.allocation.node_concurrent_recoveries:2
Number of concurrent recovery threads when adding a delete node or load balancing, default is 4.
indices.recovery.max_size_per_sec:0
Set the bandwidth that is limited when data is restored, such as 100MB, default is 0, that is, unrestricted.
Indices.recovery.concurrent_streams:5
Set this parameter to limit the maximum number of concurrent streams to open concurrently when recovering data from other fragments, by default 5.
Discovery.zen.minimum_master_nodes:1
Set this parameter to ensure that the nodes in the cluster know the other n master-qualified nodes. The default is 1, for large clusters, you can set a larger value (2-4)
Discovery.zen.ping.timeout:3s
Setting the ping connection timeout when automatically discovering other nodes in the cluster defaults to 3 seconds, which can be used to prevent automatic discovery errors for values that are higher than the poor network environment.
Discovery.zen.ping.multicast.enabled:false
Sets whether multicast discovery nodes are turned on, which is true by default.
Discovery.zen.ping.unicast.hosts: ["host1", "Host2:port", "Host3[portx-porty]"]
Sets the initial list of master nodes in the cluster, through which nodes are automatically discovered to join the cluster.
The following are the slow log parameter settings for some queries
Index.search.slowlog.level:TRACE
Index.search.slowlog.threshold.query.warn:10s
Index.search.slowlog.threshold.query.info:5s
Index.search.slowlog.threshold.query.debug:2s
Index.search.slowlog.threshold.query.trace:500ms
Index.search.slowlog.threshold.fetch.warn:1s
Index.search.slowlog.threshold.fetch.info:800ms
Index.search.slowlog.threshold.fetch.debug:500ms
Index.search.slowlog.threshold.fetch.trace:200ms
1.2. Advanced Configuration (thread pool)
A elasticsearch node will have multiple thread pools, but it is important to have the following four:
Index: Primarily index data and delete data operations (default is cached type)
Search: Mainly for access, statistics and search operations (default is cached type)
Bulk operations (BULK): Bulk operations on indexes (default is cached type)
Update (refresh): Mostly update operations (default is cached type)
You can change the thread pool type (type) by setting a parameter, for example, to change the thread pool of an index to the blocking type:
Min:1
Size:30
wait_time:30s
The following are the three types of thread pools you can set:
Cache
The cache pool is a thread pool of infinite size, and if there are many requests, many threads will be created, and here is an example:
ThreadPool
Index
Type:cached
fixed
The fixed thread pool maintains a constant number of threads to process the request queue.
The size parameter sets the number of threads, and the default setting is 5 times times the CPU core
Queue_size can control the size of the pending request queue. The default is set to-1, which means unrestricted. When a request arrives but the queue is full, the Reject_policy parameter can control its behavior. The default is abort, which causes the request to fail. Setting to caller causes the request to be executed in the IO thread.
ThreadPool
Index
Type:fixed
Size:30
queue:1000
Reject_policy:caller
Blocking
The blocking thread pool allows you to set a minimum value (min, default 1) and thread pool size (size, which defaults to 5 times times the CPU core). It also has a wait queue, and the queue size (queue_size) defaults to 1000 when the queue is full. It invokes the IO thread based on the set waiting time (Wait_time, the default is 60 seconds), and the error occurs if the timeout is not performed.
ThreadPool
Index
Type:blocking
Min:1
Size:30
wait_time:30s
The author in the actual work, because the program starts to produce a large number of requests, resulting in the queue size overflow situation, so that the query request error, you can in the following 2 solutions to balance processing:
1, increase the queue length, but with the resulting high CPU consumption.
2, the Optimization program, the appropriate control of the program's concurrent request volume.
1.3. Operating system Configuration
1, file handle restrictions: ES in the indexing process, especially when there are many fragments and replicas, will create several files. Therefore, the operating system cannot limit the number of open files to less than 32000. For Linux servers, you can modify them in/etc/security/limits.conf, and you can use the Ulimit command to view the current values.
2, Node memory configuration: ES the default 2014M memory space per node may not be enough. If there is an out of memory error error in the log file, the environment variable es_heap_size should be set to a value greater than 1024. Note that this value should exceed 50% of the total available physical memory and that the remaining memory can be used as a disk cache to greatly improve search performance.