Elasticsearch Configuration Detailed

Source: Internet
Author: User

The default many parameters of ES are not to be changed, when encountering performance problems, the first thing to consider is to increase the node, and optimize the data structure.
Here are some of the configuration items to be aware of, first of all the configuration items that are generally known:

Cluster.name:elasticsearch_production Cluster name or to change, do not use the default, in the case of which intranet, and then a test of the notebook opened ES is not automatically joined the cluster.

Node.name:elasticsearch_005_data node Name,

Path.data:/PATH/TO/DATA1 data store path

Path.logs:/path/to/logs Log path

Path.plugins:/path/to/plugins Plugin Path
The above three initial directories are in the installation directory, the main consideration is that the reload does not overwrite the lost data

Discovery.zen.minimum_master_nodes This configuration is mainly used to prevent "brain crack" problem, the specific facilities can read this article: How to prevent elasticsearch cluster of brain crack phenomenon

Gateway.recover_after_nodes:n

This setting mainly prevents unnecessary data processing, such as a cluster reboot, then a machine is slow, then the machine will form a cluster, select Master, and then recover the failed cluster from the backup. The cluster is back to normal at this time. Then the slow machine restarts successfully, synchronizing the data to this machine and removing redundant data. This configuration determines the recovery operation until the nth node is available

Gateway.expected_nodes:10

Gateway.recover_after_time:5m This configuration is the time of recovery, which is configured for 5 minutes

These three requirements first wait for n nodes to recover, then wait 5 minutes or 10 nodes have joined the cluster to begin data recovery

Discovery.zen.ping.multicast.enabled:false

Discovery.zen.ping.unicast.hosts: ["host1", "Host2:port"]

ES through the UDP protocol to discover nodes, it is recommended to turn off the multicast automatic discovery node mechanism, the above configuration is the main one is to disable multicast, one is to write the node to try to connect, if there is a dedicated master node can be configured as the Master node.

Do not modify the GC mechanism of Java

The thread pool is set to the number of cores, such as the eight-core machine is set to 8, a lot of blocking operations are lucene to operate, such as hard disk read and write. The search thread setting can be set to three times times the number of cores

Heap Size Setting

can be configured by command export ES_HEAP_SIZE=10g , or at startup ./bin/elasticsearch -Xmx10g -Xms10g , to make xmx and XMS equal. You can also modify ./bin/elasticsearch.in.sh the file, modify the SH code, modify the relevant configuration

Then this memory is generally set to half the memory, because there is also memory to be allocated to Lucene for use. Then no matter how much memory you have, this setting does not exceed 32GB, which is related to some characteristics of the JVM

If you are using a service to start ES, you need to modify the service's configuration file ./bin/service/elasticsearch.conf to implement the configuration.

Memory Exchange

This is fatal to the performance impact, you can use the command sudo swapoff -a to temporarily close, permanently shut down the need to edit the file/etc/fstab

You can also add configurations to the configuration file so that the bootstrap.mlockall: true JVM can lock the memory and avoid being swapped to physical storage media.

File descriptor

I don't know that, either, or a file handle? Anyway is file descriptor, this is mainly because lucene to open a lot of files, Elasticsearch to start a lot of sockets, in Linux system these are processed by files, Linux will limit the number of files opened per process, you can sysctl -w vm.max_map_count=262144to temporarily modify, or modify the configuration /etc/sysctl.conf file vm.max_map_count settings, and finally to perform sysctl -p the settings to take effect.

And then I started ES through elasticsearchservicewrapper. Here also need to modify the configuration file ./bin/service/elasticsearch , remove the ULIMIT_N comments before the configuration item, and configure the corresponding value, the recommended value is 32000, however I configured 64000.

To see if the setting is successful http://localhost:9200/_nodes/process?pretty , the max_file_descriptors corresponding value is inside.

The configuration file is located in the%es_home%/config/elasticsearch.yml file, and you can configure it by opening it with EditPlus.
All configurations can use environment variables, for example:
Node.rack: ${rack_env_var}
Represents an Rack_env_var variable in an environment variable.
The following is a list of Elasticsearch configurable items:
1. Cluster name, default is Elasticsearch:
Cluster.name:elasticsearch
2. Node name, the node name is created automatically when ES starts, but you can also configure:
Node.name: "Franz Kafka"
3. Whether as the primary node, each node can be configured as the primary node, and the default value is true:
Node.master:true
4. Whether the data is stored, that is, the index fragment is stored, the default value is true:
Node.data:true
The simultaneous configuration of master and data produces some bizarre effects:
1) When Master is false and data is true, the node is severely overloaded;
2) When Master is true and data is false, the node acts as a coordinator;
3) When Master is False,data also false, the node becomes a load balancer.
You can connect http://localhost:9200/_cluster/health or http://localhost:9200/_cluster/nodes, or use plugins http://github.com/ Lukas-vlcek/bigdesk or Http://mobz.github.com/elasticsearch-head to view the cluster status.
5. Each node can define some common properties associated with it for filtering when a post-cluster is fragmented:
node.rack:rack314
6. By default, multiple nodes can be started on the same installation path, if you want your ES to start only one node, you can set the following:
Node.max_local_storage_nodes:1
7. Set the number of fragments for an index, the default value is 5:
Index.number_of_shards:5
8. Set the number of copies that an index can be copied, and the default value is 1:
Index.number_of_replicas:1
When you want to disable an advertisement, you can set the following:
Index.number_of_shards:1
index.number_of_replicas:0
The settings of these two properties directly affect the execution of indexes and search operations in the cluster. Assuming you have enough machines to hold fragments and replicas, you can set these two values as follows:
1) Having more fragments can improve the index execution capability and allow a large index to be distributed through the machine;
2) Having more replicators can improve the ability of search execution and clustering.
For an index, number_of_shards can only be set once, and Number_of_replicas may be incremented or reduced at any time using the index Update Settings API.
Elasticsearch focuses on load balancing, migration, clustering results from nodes, and more. You can try a variety of designs to accomplish these functions.
You can connect Http://localhost:9200/A/_status to detect the status of an index.
9. Where the configuration files are located, i.e. where Elasticsearch.yml and Logging.yml are located:
Path.conf:/path/to/conf
10. Where the index data is assigned to the current node:
Path.data:/path/to/data
You can optionally include more than one location so that the data spans the file level so that there are more free paths at the time of creation, such as:
Path.data:/path/to/data1,/path/to/data2
11. Temporary File Location:
Path.work:/path/to/work
12. log file Location:
Path.logs:/path/to/logs
13. Plug-in installation location:
Path.plugins:/path/to/plugins
14. Plugin hosting location, if one of the plugins in the list is not installed, the node will not start:
Plugin.mandatory:mapper-attachments,lang-groovy
When the JVM starts swapping, Elasticsearch does not perform well: you need to protect the JVM from swapping, and you can set Bootstrap.mlockall to true to disallow swapping:
Bootstrap.mlockall:true
Make sure that the values for the Es_min_mem and Es_max_mem are the same, and that you can allocate enough intrinsic to the elasticsearch and leave enough memory for the system operation.
16. By default, Elasticsearch uses an 0.0.0.0 address and opens port 9200-9300 for HTTP transmission, 9300-9400 ports for node-to-node communication, and an IP address for self-provisioning:
network.bind_host:192.168.0.1
Publish_host set the address of the other node to connect to this node, and if not set, the Publish_host address must be the real address:
network.publish_host:192.168.0.1
Bind_host and Publish_host can be set together:
network.host:192.168.0.1
19. You can customize the port on which the node interacts with other nodes:
transport.tcp.port:9300
20. When interacting between nodes, you can set whether or not to compress and convert to No compression:
Transport.tcp.compress:true
21. Custom ports can be monitored for HTTP transport:
http.port:9200
22. Set the maximum length of the content:
http.max_content_length:100mb
23. Disable HTTP
Http.enabled:false
24. The gateway allows the cluster state to be held after all cluster restarts, changes to the cluster state are preserved, and when the cluster is first enabled, it can be read from the gateway to the State, and the default gateway type (also recommended) is Local:
Gateway.type:local
25. Allow the recovery process after N nodes are started:
Gateway.recover_after_nodes:1
26. Set the time-out period for initializing the recovery process:
Gateway.recover_after_time:5m
27. Set the maximum nodes that can exist in the cluster:
Gateway.expected_nodes:2
28. Set the concurrent number of a node in two cases, one in the initial recovery process:
Cluster.routing.allocation.node_initial_primaries_recoveries:4
The other is when you add, remove nodes, and adjust:
Cluster.routing.allocation.node_concurrent_recoveries:2
29. Set the throughput at recovery, which is unlimited by default:
indices.recovery.max_size_per_sec:0
30. Set the maximum number of streams opened when recovering fragments from a peer node:
Indices.recovery.concurrent_streams:5
31. Set the number of primary nodes in a cluster, which can be between 2-4 when there are more than three nodes:
Discovery.zen.minimum_master_nodes:1
32. Set the time-out when pinging other nodes, which can be larger when the network is slow:
Discovery.zen.ping.timeout:3s
There are more settings on http://elasticsearch.org/guide/reference/modules/discovery/zen.html on the discovery.
33. Prohibit the current node from discovering multiple cluster nodes, the default value is true:
Discovery.zen.ping.multicast.enabled:false
34. Set up a list of master nodes that can be discovered when a new node is started (mainly for different network segment machine connections):

Discovery.zen.ping.unicast.hosts: ["host1", "Host2:port", "Host3[portx-porty]"]

35. Set whether the index can be deleted or closed via regular or _all

Action.destructive_requires_name default false allows setting true to not allow

Elasticsearch Configuration Detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.