Elasticsearch-Configuration Detailed __elasticsearch

Source: Internet
Author: User
Tags log4j
1.1. Basic Configuration

Elasticsearch Config folder contains two profiles: Elasticsearch.yml and Logging.yml, the first is the basic ES profile, the second is the log configuration file, ES is also used to log the use of log4j, So the settings in the logging.yml are set according to the normal log4j configuration file. The following is a brief explanation of what is configurable in this elasticsearch.yml file.

Cluster.name:elasticsearch

Configure ES cluster name, default is Elasticsearch,es will automatically find in the same network section of the ES, if there are multiple clusters under the same network segment, you can use this attribute to distinguish between different clusters.

Node.name: "Franz Kafka"

The node name, which is randomly assigned a name in the name list, which is in the Name.txt file in the Config folder in the ES jar bundle, with many interesting names added by the author.

Node.master:true

Specifies whether the node is eligible to be elected as node, by default True,es is the first machine in the default cluster and will be returned to master if the machine hangs.

Node.data:true

Specifies whether the node stores index data, and defaults to true.

Index.number_of_shards:5

Sets the default index fragment number, which defaults to 5 slices.

Index.number_of_replicas:1

Sets the number of default index replicas, which defaults to 1 replicas.

Path.conf:/path/to/conf

Sets the storage path for the configuration file, which defaults to the Config folder in the ES root directory.

Path.data:/path/to/data

Set the storage path for index data, default is the Data folder under the ES root directory, you can set multiple storage paths, separated by commas, example:

Path.data:/path/to/data1,/path/to/data2

Path.work:/path/to/work

Sets the storage path for the temporary file, which defaults to the work folder in the ES root directory.

Path.logs:/path/to/logs

Sets the storage path for the log file, which defaults to the logs folder in the ES root directory

Path.plugins:/path/to/plugins

Set the storage path for the plug-in, default is the Plugins folder under the ES root directory

bootstrap.mlockall:true

set to True to lock memory. Because the efficiency of the ES is reduced when the JVM begins to swapping, it is possible to set the ES_MIN_MEM and ES_MAX_MEM two environment variables to the same value, and to ensure that the machine has enough memory allocated to ES. Also allow Elasticsearch process to lock memory, Linux can be through the ' ulimit-l Unlimited ' command.

network.bind_host:192.168.0.1

Sets the IP address of the binding, which can be IPv4 or IPv6, and defaults to 0.0.0.0.

network.publish_host:192.168.0.1

Set the IP address of the other node interacting with the node, if not set it automatically determines that the value must be a real IP address.

network.host:192.168.0.1

This parameter is used to set both the Bind_host and publish_host above two parameters.

transport.tcp.port:9300

sets the TCP port for interaction between nodes, which defaults to 9300.

Transport.tcp.compress:true

Sets whether to compress data for TCP transmissions by default, not compression.

http.port:9200

sets the HTTP port for external services, which defaults to 9200.

http.max_content_length:100mb

Set the maximum capacity of content, default 100MB

Http.enabled:false

Whether to use the HTTP protocol to provide services externally, default to True, open.

gateway.type:local

The type of gateway, which defaults to local file system, can be set to local file system, Distributed File System, Hadoop HDFs, and Amazon's S3 server.

Gateway.recover_after_nodes:1

Set the n nodes in the cluster to start with data recovery by default of 1.

Gateway.recover_after_time:5m

Sets the timeout for initializing the data recovery process, which defaults to 5 minutes.

Gateway.expected_nodes:2

Set the number of nodes in this cluster, the default is 2, and once the N nodes are started, data recovery is done immediately.

Cluster.routing.allocation.node_initial_primaries_recoveries:4

The number of concurrent recovery threads, which defaults to 4, when data recovery is initialized.

Cluster.routing.allocation.node_concurrent_recoveries:2

Number of concurrent recovery threads when adding a delete node or load balancing, default is 4.

indices.recovery.max_size_per_sec:0

Set the bandwidth that is limited when data is restored, such as 100MB, default is 0, that is, unrestricted.

Indices.recovery.concurrent_streams:5

Set this parameter to limit the maximum number of concurrent streams to open concurrently when recovering data from other fragments, by default 5.

Discovery.zen.minimum_master_nodes:1

Set this parameter to ensure that the nodes in the cluster know the other n master-qualified nodes. The default is 1, for large clusters, you can set a larger value (2-4)

Discovery.zen.ping.timeout:3s

Setting the ping connection timeout when automatically discovering other nodes in the cluster defaults to 3 seconds, which can be used to prevent automatic discovery errors for values that are higher than the poor network environment.

Discovery.zen.ping.multicast.enabled:false

Sets whether multicast discovery nodes are turned on, which is true by default.

Discovery.zen.ping.unicast.hosts: ["host1", "Host2:port", "Host3[portx-porty]"]

Sets the initial list of master nodes in the cluster, through which nodes are automatically discovered to join the cluster.

The following are the slow log parameter settings for some queries

Index.search.slowlog.level:TRACE
Index.search.slowlog.threshold.query.warn:10s
Index.search.slowlog.threshold.query.info:5s
Index.search.slowlog.threshold.query.debug:2s
Index.search.slowlog.threshold.query.trace:500ms
Index.search.slowlog.threshold.fetch.warn:1s
Index.search.slowlog.threshold.fetch.info:800ms
Index.search.slowlog.threshold.fetch.debug:500ms
Index.search.slowlog.threshold.fetch.trace:200ms


1.2. Advanced Configuration (thread pool)

A elasticsearch node will have multiple thread pools, but it is important to have the following four:

Index: Primarily index data and delete data operations (default is cached type)

Search: Mainly for access, statistics and search operations (default is cached type)

Bulk operations (BULK): Bulk operations on indexes (default is cached type)

Update (refresh): Mostly update operations (default is cached type)

You can change the thread pool type (type) by setting a parameter, for example, to change the thread pool of an index to the blocking type:

Min:1

Size:30

wait_time:30s


The following are the three types of thread pools you can set:

Cache

The cache pool is a thread pool of infinite size, and if there are many requests, many threads will be created, and here is an example:

ThreadPool

Index

Type:cached


fixed

The fixed thread pool maintains a constant number of threads to process the request queue.

The size parameter sets the number of threads, and the default setting is 5 times times the CPU core

Queue_size can control the size of the pending request queue. The default is set to-1, which means unrestricted. When a request arrives but the queue is full, the Reject_policy parameter can control its behavior. The default is abort, which causes the request to fail. Setting to caller causes the request to be executed in the IO thread.

ThreadPool

Index

Type:fixed

Size:30

queue:1000

Reject_policy:caller


Blocking

The blocking thread pool allows you to set a minimum value (min, default 1) and thread pool size (size, which defaults to 5 times times the CPU core). It also has a wait queue, and the queue size (queue_size) defaults to 1000 when the queue is full. It invokes the IO thread based on the set waiting time (Wait_time, the default is 60 seconds), and the error occurs if the timeout is not performed.

ThreadPool

Index

Type:blocking

Min:1

Size:30

wait_time:30s


The author in the actual work, because the program starts to produce a large number of requests, resulting in the queue size overflow situation, so that the query request error, you can in the following 2 solutions to balance processing:

1, increase the queue length, but with the resulting high CPU consumption.

2, the Optimization program, the appropriate control of the program's concurrent request volume.


1.3. Operating system Configuration

1, file handle restrictions: ES in the indexing process, especially when there are many fragments and replicas, will create several files. Therefore, the operating system cannot limit the number of open files to less than 32000. For Linux servers, you can modify them in/etc/security/limits.conf, and you can use the Ulimit command to view the current values.

2, Node memory configuration: ES the default 2014M memory space per node may not be enough. If there is an out of memory error error in the log file, the environment variable es_heap_size should be set to a value greater than 1024. Note that this value should exceed 50% of the total available physical memory and that the remaining memory can be used as a disk cache to greatly improve search performance.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.