Elasticsearch is an open source distributed real-time search and analysis engine that supports cloud services. It is based on the Apache Lucene search engine's class library and provides full-text search capabilities, multi-lingual support, a dedicated query language, support for geolocation services, context-based search suggestions, AutoComplete, and search fragments (snippet) capabilities. Elasticsearch supports restful APIs and can use JSON to invoke its various functions via HTTP, including search, analysis, and monitoring. The following is a description of the Elasticsearch distributed search configuration files of various parameters of the specific meaning.
Elasticsearch Config folder has two configuration files: Elasticsearch.yml and Logging.yml, the first is the basic profile of ES, the second is the log configuration file, ES is also used log4j to record the log, so logging.yml Settings are set according to the normal log4j configuration file. The following is the main explanation of the Elasticsearch.yml this file can be configured things.
Configure the ES cluster name, by default elasticsearch,es will automatically discover ES in the same network segment, if there are multiple clusters under the same network segment, you can use this attribute to distinguish different clusters.
Node.name: "Franz Kafka"
Node name, by default randomly specifies a name in the name list, which is in the Name.txt file in the Config folder in the ES jar package, which has many interesting names added by the author.
Specifies whether the node is eligible to be elected node, by default True,es is the first machine in the default cluster as master, and if this machine hangs, it will be re-elected master.
Specifies whether the node stores index data, which is true by default.
Sets the default index number of shards, which defaults to 5 slices.
Sets the default number of index replicas, which defaults to 1 copies.
Sets the storage path of the configuration file, which is the Config folder under the ES root directory by default.
Set the storage path of the index data, the default is the Data folder in the ES root directory, you can set multiple storage paths, separated by commas, for example:
Set the storage path for temporary files, which is the work folder in the ES root directory by default.
Set the storage path for the log file, which is the logs folder in the ES root directory by default
Set the storage path of the plug-in, by default the plugins folder in the ES root directory
Set to True to lock the memory. Because ES is inefficient when the JVM starts to swapping, make sure it does not swap, set the ES_MIN_MEM and ES_MAX_MEM environment variables to the same value, and ensure that the machine has enough memory allocated to ES. Also allow the Elasticsearch process to lock the memory, Linux can be through the ' ulimit-l Unlimited ' command.
Sets the IP address of the binding, which can be either IPv4 or IPv6, which defaults to 0.0.0.0.
Set the other node and the IP address of the node interaction, if not set it will automatically determine that the value must be a real IP address.
This parameter is used to set both Bind_host and Publish_host above two parameters.
Set the TCP port for interaction between nodes, which is 9300 by default.
Sets whether to compress the data when TCP is transmitted, by default, false, not compressed.
Sets the HTTP port for the external service, which defaults to 9200.
Set the maximum capacity of content, default 100MB
Whether to use the HTTP protocol to provide services externally, the default is true, open.
The type of gateway, default to local file system, can be set to local file system, Distributed File System, Hadoop HDFs, and Amazon S3 Server, other file system Setup method next time.
Sets the data recovery at the start of N nodes in the cluster by default of 1.
Sets the time-out for initializing the data recovery process, which is 5 minutes by default.
Set the number of nodes in this cluster, the default is 2, once the N nodes are started, data recovery will be done immediately.
When initializing data recovery, the number of concurrent recovery threads is 4 by default.
The number of concurrent recovery threads when adding a delete node or load balancer defaults to 4.
Set the bandwidth limit for data recovery, such as 100MB, which defaults to 0, which is unlimited.
Set this parameter to limit the number of concurrent streams to open when recovering data from other shards by default of 5.
Set this parameter to ensure that the nodes in the cluster can know the other N nodes that have a master qualification. The default is 1, for large clusters, you can set a larger value (2-4)
Sets the ping connection time-out when the other nodes are automatically discovered in the cluster, which defaults to 3 seconds, and an error in preventing Autodiscover when the value is higher than the poor network environment.
Sets whether to open the multicast Discovery node, which is true by default.
Discovery.zen.ping.unicast.hosts: ["host1", "Host2:port", "Host3[portx-porty]"]
Sets the initial list of master nodes in the cluster, which can be used to automatically discover the nodes that are newly joined to the cluster.
The following are the slow log parameter settings for some queries, as follows
Index.search.slowlog.level:TRACEindex.search.slowlog.threshold.query.warn: 10sindex.search.slowlog.threshold.query.info:5sindex.search.slowlog.threshold.query.debug: 2sindex.search.slowlog.threshold.query.trace:500ms Index.search.slowlog.threshold.fetch.warn: 1sindex.search.slowlog.threshold.fetch.info:800msindex.search.slowlog.threshold.fetch.debug : 500msindex.search.slowlog.threshold.fetch.trace:200ms
Elasticsearch Distributed Search configuration file