Mac installation 6.1.2 version Elasticsearch and optimized configuration practices

Source: Internet
Author: User

Install on 1,mac (specify JAVA8)

Install java8vim. base_profile file contents: Java_home=/library/java/javavirtualmachines/jdk1. 8. 0_162.jdk/contents/HomePath= $JAVA _home/bin: $PATHCLASSPATH=.: $JAVA _home/lib/dt.jar : $JAVA _home/lib/tools.jarsource. Base_profileechoInstall  Elasticsearch Install Chinese word breaker (based on current ES version): Elasticsearch Install https://github.com/medcl/ Elasticsearch-analysis-ik/releases/download/v6.1.2/elasticsearch-analysis-ik-6.1.2.zip

2, optimized configuration

32000 true  total memory of 60% but cannot exceed 32G-xmx1g maximum use of memory in MAC configuration file path:/usr/local/ Etc/elasticsearch

3,elasticsearch Index Configuration

Create a new indexcurl using mappings-X PUT'localhost:9200/kline_test'-H'Content-type:application/json'-D'{"Settings": {    "Number_of_shards":1, #分片数"Number_of_replicas":0#副本数: It is not recommended to make a copy when importing data in bulk."Mappings": {    "Kline_data": {      "Properties": {        "Code": {          "type":"text", #数据类型"Include_in_all":false#禁用_all查询},"name": {          "type":"text","Include_in_all":false, #禁用_all查询"Analyzer":"Ik_max_word", #使用可以对文本进行最大数量的分词"Search_analyzer":"Ik_max_word"        },        " Market": {          "type":"byte","Include_in_all":false, #禁用_all查询}," Time": {          "type":"Date","Include_in_all":false, #禁用_all查询"format":"YYYY-MM-DD hh:mm:ss| | yyyy-mm-dd| | Epoch_millis", #指定数据的查询格式},"mongo_id": {"type":"text","Index": False #不对该字段进行索引, can neither be used as a query condition}} }  }}'

※ Special Note: After the 6.1 version of the Doc Field properties: Index configuration parameter changed to TRUE or False instead of previous: Not_analyzed and no

The following is a description of the properties for each field: (Go from http://blog.csdn.net/ntc10095/article/details/73730772)

 "Status": {            "type":"string",//String Type            "Index": True//is indexed, True: The index can be queried, flase: The field will not be queried            "Analyzer":"ik"//Specify a word breaker            "Boost":1.23//field-level fractional weighting            "doc_values":false//for the Not_analyzed field, which is on by default, the word breaker cannot be used, and sorting and aggregation can improve performance and save memory            "Fielddata":{"format":"Disabled"}//for Word breaker fields, performance can be improved when participating in sorting or aggregation, Doc_value is recommended for non-participle fields            " Fields":{"Raw":{"type":"string","Index":"not_analyzed"}}//You can provide multiple indexing modes for a field, a value for the same field, a word breaker, a non-participle            "Ignore_above": - //text that exceeds 100 characters will be ignored and not indexed            "Include_in_all": ture//sets whether this field is included in the _all field, which is true by default unless index is set to no option            "index_options":"Docs"//4 Optional Parameters Docs (index document number), Freqs (document number + word frequency), positions (document number + Word frequency + position, usually used for distance query), offsets (document number + Word frequency + position + offset, Usually used in highlighted fields) the word breaker is position by default, and the other default is Docs            "norms":{"Enable":true,"Loading":"Lazy"}//Word breaker default configuration, no word breaker field: default {"Enable": false}, Storage length factor and index boost, recommended for use in scoring fields, additional memory consumption            "Null_value":"NULL"//set the initialization values for some missing fields, only string can be used, and the null value of the word breaker will also be participle            "Position_increament_gap":0//affect the distance query or approximate query, you can set the data on the multi-value field on the Fire Word field, the query can specify the slop interval, the default value is            "Store":false//whether this field is individually set for storage and detached from the _source field, default is False, search only, cannot get value            "Search_analyzer":"ik"//set the search word breaker, the default is consistent with Ananlyzer, such as index with Standard+ngram, search with standard used to complete the automatic prompt function            "Similarity":"BM25"//The default is the TF/IDF algorithm, which specifies a field scoring strategy that is only valid for string types and Word breakers            "Term_vector":"No"//The default does not store vector information, supports parameter Yes (term store), with_positions (term+ position), With_offsets (term+ offset), with_positions_offsets (term+ position + Offset) to quickly highlight fast vector highlighter can improve performance, but open and will increase the index volume, not suitable for large data volume}

4, careful sharding

After you have configured your index in the Elasticsearch cluster, you cannot adjust the Shard settings while the cluster is running. Even if you need to adjust the number of shards later, you can only create new and re-index the data (reindex) Although reindex can be time-consuming, it can at least ensure that you do not stop each shard at an additional cost:
      • Each shard is essentially a lucene index, so it consumes the corresponding file handle, memory and CPU resources
      • Each search request is dispatched to each shard in the index. It's not a problem if the shards are scattered across different nodes. However, when the shards start competing for the same hardware resources, performance will gradually decrease
      • ES uses word frequency statistics to calculate correlations. Of course, these statistics are also allocated to each shard. If very little data is maintained on a large number of shards, it will result in poor document correlation
Determination of the number of shards: the principle: Each shard is best not more than 30G, try not to more than the number of shards calculation: At the beginning of the determination of the best estimate of the amount of data in the future, and then the Shard (for example: the future data may be 300G, that requires at least 10-11 shards) ※ In order to ensure the quality of the query, it is recommended that the number of shards should not be too large; some people on the internet say that shards cannot exceed 20 or more than 100, and shard capacity is divided: cannot exceed 20G or 30G ※ The individual thinks that it is necessary to optimize according to the actual situation, if the amount of data is large and the efficiency is still not improved after adding additional nodes (es will automatically complete the distribution balance of the shards on different nodes) . You can create another index and then access control through the program middleware ※ In addition, through the middleware can make specific data into the specified shard (parameter: _shards:0,1,2), so as to achieve a centralized storage of specific data, improve query efficiency    
Number of nodes determined (initial): Number of nodes <= number of shards * (number of replicas + 1) 5, optimized with optimize-Over time, the data for each shard in Elasicsearch is increasing and the index is increasing, and the resulting segment (each of which is actually composed of multiple sgment files in each shard) will also grow. The more segment, the poorer the performance of the query, so you can improve query performance by calling the Optimize command to combine multiple segment into a smaller number of segment (at least one).
      • Curl-xpost Http://localhost:9200/shb01/_optimize?max_num_segments=1
-Deleting a document in ES will not immediately remove it from the hard drive, it will only mark that the document is deleted, Lucene produces a. del file, and during the retrieval process the file will be retrieved only at the end of the filter, which in fact will affect the efficiency, we can periodically delete these files, As with the merged index fragment, you can use Curl
      • Curl-xpost Http://localhost:9200/_optimize?only_expunge_deletes=true

Mac installation 6.1.2 version Elasticsearch and optimized configuration practices

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.