ES Learning 2

Last Update:2016-01-13 Source: Internet

Author: User

Tags solr server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1: In Es, pagination in the general search engine does not provide a large page query, because the larger the number of queries, the less efficient the query. Example: Let's start by anticipating that we are searching for an index that has 5 primary shards.        When we request the first page of the search, each shard produces its top 10, and then returns them to the request node, which then re-sorts the 50 results to produce the final top 10. Now think about it. We want to get page 1,000, that is, 10,001 to 10th, 010 results, in the same way, each shard will first produce its first 10,010, and then request the node to handle the 50,050 results uniformly, and then discard 50,040 of them    ！        Now you should understand that in a distributed system, the system resources consumed by large page number requests are growing exponentially.        That's why web search engines don't offer more than 1,000 search results. 2: Timeout in es if the index data is indeed many, the query speed is not ideal, you can use the time-out period, when the query reaches the specified time-out time, it will directly return the part of the data that has been found to the user, which will not affect the user experience.    But the returned data may not be very accurate. Use method (data must be returned after 10 milliseconds) Curl-xget http://localhost:9200/_search?timeout=10msThe Java code adds the following code to the location of the query Client.preparesearch ("Crxy"). SetTimeout ("10"))                3: Multi-index and multi-type queries refer to the code in PPT. 4: es integrated IK chinese word breaker tool1: Download Es-ik plugin, download compressed file as: elasticsearch-analysis-ik-master.zip https://Github.com/medcl/elasticsearch-analysis-ik2: Compile plug-in source code in local Windows machine decompression: Elasticsearch-analysis-ik-master.zip CD Elasticsearch-analysis-ik-Master MVN Clean Package-Dskiptests executes the package command and generates the corresponding plug-in package, located in the Elasticsearch-analysis-ik-master\target\releases below the elasticsearch-analysis-ik-1.2.9. zip put this (Elasticsearch-analysis-ik-1.2.9.zip) The zip package is uploaded to the plug-in directory on the ES server (directory location:/usr/local/elasticsearch-1.4.4/plugins/analysis-ik) Upload the past and unzip the CD/usr/local/elasticsearch-1.4.4/plugins/analysis-IK unzip elasticsearch-analysis-ik-1.2.9. zip rm-F elasticsearch-analysis-ik-1.2.9. zip3: Upload the configuration file directory in the Es-ik plugin to the directory/usr/local/elasticsearch-1.4.4/conf Note: EsThe configuration file directory in the-ik plugin refers to elasticsearch-analysis-ik-master\config The following IK directory, this directory needs to be uploaded to the Es_home conf directory. 4: Modify the elasticsearch.yml file CD/usr/local/elasticsearch-1.4.4/conf vi elasticsearch.yml (add the following line configuration inside) Index.analysis.analyzer.default. Type:ik5: Test word breaker requires first creating CRXY Index library Curl' Http://localhost:9200/crxy/_analyze?analyzer=ik&pretty=true '-d ' {"text": "We Are Chinese"} ' 5: Settings and mappings in ES settings can specify the number of shards and copies of the index library view settings Information Curl-xget http://Localhost:9200/crxy/_settings?prettyExample: (Operation does not exist index) Curl-xput ' localhost:9200/crxy/'-d ' {"Settings": {"Number_of_shards": 3, "Number_of_replicas": 2}} '(Operation already exists index) Curl-xput ' localhost:9200/crxy/_settings '-d ' {"index": {"Number_of_replicas": 2}} 'The Java Code Operation Reference Estest.java mappings is equivalent to the Schema.xml file in SOLR and is equivalent to the table structure information in MySQL, which allows you to specify ES        Some basic properties of the field.                Of course, by default, there are auto-mapped features in ES, and you don't need to set basic properties for unknown fields. View mappings Information Curl-xget http://Localhost:9200/crxy/emp/_mapping?prettyNote: You can use Indexanalyzer to define a word breaker, or you can use Index_analyzer to define an index that does not exist for a word breaker operation.-xput ' localhost:9200/crxy1 '-d ' {"mappings": {"emp": {"Properties": {"name": {"type": "string", "Indexanalyzer": "Ik", " Searchanalyzer ":" Ik "}}}} 'operation already exists index Curl-xpost http://localhost:9200/crxy/emp/_mapping-d ' {"Properties": {"name": {"type": "string", "Indexanalyzer": "Ik", " Searchanalyzer ":" Ik " }} '6: es in Source Edit packaging (do not need to upload the server, on the local windows to execute)1: Download the source code, download the source package for elasticsearch-1.4. zip https://github.com/elastic/elasticsearch/tree/1.42: Unzip the source to the current directory3: Pack CD Elasticsearch-1.4mvn Clean Package-dskiptests4: In elasticsearch-1.4The \target\releases\ directory below will have the following two files Elasticsearch-1.4.6-SNAPSHOT.tar.gz Elasticsearch-1.4.6-Snapshot.zip This is consistent with the tar package of es that we download on our website. 7: The query query in ES default is randomize across shards randomly selected, indicating random data from the Shard _local: The query operation will take precedence in the local node of the Shard in the query, no words in other nodes query.    _primary: Refers to the query only in the primary shard query _primary_first: Refers to the query will first in the primary shard query, if the primary shard can not find (hang), will be queried in the copy. _only_node: Refers to the node in the specified ID query, if the node is only a subset of the DX Query index portion of the Shard, only in this part of the Shard to find, so the query results may be incomplete.    If _only_node:123 is queried in the node with Node ID 123. _prefer_node:nodeid performs a query _shards on the specified node first:0, 1,2,3,4: Query The Data custom query method of the specified shard: You can let the user specify the query method for querying multiple nodes: _only_nodes want to customize the query method, need to modify the source code, first import the source into eclipse. Es source code for MAVEN project, directly into the MAVEN project can be.        , import, pom file will be error, only the bottom of the plugin in some of the configuration will be error, can be ignored.        After importing to eclipse, the window will pop up and click Canle directly. The following will need to modify the source code, find this class:-org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting Using CTRL+o can pop up all the methods in this class, find this method preferenceactivesharditerator click on 171 lines of code to enter the parse method in the switch language of the parse method Added judgment in the sentence, Case"_only_nodes":                returnOnly_nodes; You also need the 61-line section of this class, adding an enumeration parameter Only_nodes ("_only_nodes");        Return this class Org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting parse 207 lines below the code, here to determine the different query methods Add the following code in the switch statement Caseonly_nodes:string Nodeids= Preference.substring (Preference.ONLY_NODES.type (). Length () + 1); String[] Split= Nodeids.split (",");  for(String node:split) {ensurenodeidexists (nodes, node); }                    returnindexshard.onlynodesactiveinitializingshardsit (nodeids); Enter this class org.elasticsearch.cluster.routing.IndexShardRoutingTable below 366 lines to add the following code Publicsharditerator onlynodesactiveinitializingshardsit (String nodeids) {string[] split= Nodeids.split (","); ArrayList<ShardRouting> ordered =NewArraylist<> (activeshards.size () +allinitializingshards.size ());  for(String nodeid:split) {//fill it in a randomized fashion                 for(inti = 0; I < activeshards.size (); i++) {shardrouting shardrouting=Activeshards.get (i); if(Nodeid.equals (Shardrouting.currentnodeid ())) {Ordered.add (shardrouting); }                }                 for(inti = 0; I < allinitializingshards.size (); i++) {shardrouting shardrouting=Allinitializingshards.get (i); if(Nodeid.equals (Shardrouting.currentnodeid ())) {Ordered.add (shardrouting); }                }            }            return NewPlainsharditerator (Shardid, ordered);         To this change is finished, refer to the 6th step of the source code compilation and packaging process, put the package to upload to the server to start. specific ideas, you can refer to: http:  //www.cnblogs.com/cxzdy/p/5128778.html 8: The brain fissure problem of ES cluster is the so-called brain fissure problem (similar to schizophrenia), which is a different node in the same cluster, which has a distinct understanding of the state of the cluster. http://bbs.superwu.cn/forum.php?mod=viewthread&tid=1161&extra=

The problem of brain fissure in es cluster is the so-called brain fissure problem (similar to schizophrenia), which is the different nodes in the same cluster, which have a distinct understanding of the state of the cluster. http://blog.csdn.net/cnweike/article/details/39083089 Discovery.zen.minimum_master_nodes is used to control the minimum number of cluster nodes that occur for an election behavior. 　　It is recommended to set a value greater than 1 because the primary node is meaningful only in clusters of more than 2 nodes. Brain fissure: (T) If on the same cluster, a node discovers that there are 2 nodes available, and 4 are found available on B. The number of available nodes is inconsistent. It should have been all the same.

————————————————————————————————————————————————————————————————————

1: Es optimization1) "Maximum number of open files" for large scale system, 32K or even 64K Ulimit is recommended-A (view) Ulimit-N 32000(SET)2Modify the configuration file to adjust the JVM memory size of ES1: Modify bin/elasticsearch.in.sh in the size of Es_min_mem and Es_max_mem, the proposed setting is as large as to avoid frequent allocation of memory, according to the size of the server memory, the general allocation of%Around (default 256M)2: If you use the Searchwrapper plugin to start es, then modify the bin/service/elasticsearch.conf (default 1024M)3set Mlockall to lock the process's physical memory address to avoid swapping (swapped) to improve performance modification files conf/elasticsearch.yml Bootstrap.mlockall:true4The ability to build indexes can be improved by having more shards and replica shards .5-20 more appropriate. If the number of shards is too small or too large, it can result in slower retrieval. Too many shards can cause more files to be opened during retrieval, and also cause communication between multiple servers. Too few shards will lead to a single Shard index too large, so the retrieval speed is slow. It is recommended that a single Shard store up to about 20G of index data, so the number of shards= Total data/20G replicas can improve the ability to search, but if you set a lot of copies, it will also cause additional pressure on the server, because the data needs to be synchronized. So the recommended setting is 2-3 of them. 5to periodically optimize the index, or segment more, the performance of the query is not very large if the index is not very big, you can set the segment to 1 Curl-xpost ' Http://localhost:9200/crxy/_optimize?max_num_segments=1 'Java code: Client.admin (). Indices (). Prepareoptimize ("Crxy"). Setmaxnumsegments (1). get (); 6Delete Document: Delete the document in Lucene, the data is not immediately removed on the hard disk, but in the Lucene index to produce a. del file, and during the retrieval process this part of the data will also participate in the retrieval, Lucene in the search process will determine whether to delete        If it is removed in the filter out. This will also reduce retrieval efficiency. So you can do clear delete document Curl-xpost ' Http://localhost:9200/crxy/_optimize?only_expunge_deletes=true 'client.admin (). Indices (). Prepareoptimize ("Crxy"). Setonlyexpungedeletes (true). get (); 7If a large amount of data is required at the beginning of the project, it is recommended that you set the number of replicas to 0 since ES is in the index data, if there is a copy, the data will be synchronized to the copy immediately, which will increase the pressure on es.        When the index is complete, change the copy back as needed. This can improve indexing efficiency8Remove the _all field in the mapping, the default field in index is _all (equivalent to the Copy field text in the SOLR configuration file), which makes the query easier, but increases the index time and index size"_all": {"Enabled": "False"}            9the log output level defaults to trace, that is, queries over 500ms are slow queries, print logs, resulting in high CPU and mem,io load.        Change the log output level to info to reduce the pressure on the server. Modify Es_home/conf/Logging.yaml file or modify Es_home/conf/Elasticsearch.yaml2: The client that obtains ES through reflectionImportJava.lang.reflect.Constructor; ImportJava.util.HashMap; ImportJava.util.Map; Importorg.elasticsearch.client.transport.TransportClient; Importorg.elasticsearch.common.settings.ImmutableSettings; Importorg.elasticsearch.common.settings.Settings; Importorg.elasticsearch.common.transport.InetSocketTransportAddress;  Public classEsutil {//set Client.transport.sniff to True to enable the client to sniff the entire cluster state and add the IP addresses of other machines in the cluster to the client.        StaticSettings Settings =Immutablesettings.settingsbuilder (). Put ("Cluster.name", "Elasticsearch"). Put ("Client.transport.sniff",true). build (); //Create a Private object        Private Statictransportclient Client; Static {            Try{Class<?> clazz = Class.forName (transportclient.class. GetName ()); Constructor<?> constructor =clazz. Getdeclaredconstructor (Settings.class); Constructor.setaccessible (true); Client=(transportclient) constructor.newinstance (settings); Client.addtransportaddress (Newinetsockettransportaddress ("192.168.1.170", 9300)); } Catch(Exception e) {e.printstacktrace (); }        }         Public Static synchronizedtransportclient gettransportclient () {returnclient; }                3: es need to be aware of issues1: When using Java code to operate ES, try to ensure that the local ES-dependent versions are consistent with ES in the ES cluster.2: The ES version and configuration of each node in the ES cluster are consistent, and the JDK remains consistent,4: Analysis of the Shard rules of data in Es this class org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting Sharid method This method returns the        Data exists in the Shard ID analysis source can be found, if not specified routing shard rules, then based on the data ID and the total number of shards modulo, and then the absolute value. That is, if it is 5 shards, the returned result must be 0 .-4You can also save data of the same classification to the same shard by specifying routing, which allows you to query the specified shards using the previously described use of a shard query. Example: Curl-xpost ' localhost:9200/crxy/emp?routing=test '-d ' {"name": "Zs", "age": +, "flag": "Test"} 'Java code Implementation, refer to PPT or Estest.java5:es+HBase Instance Reference<es+hbase Project Steps .txt>

ES Learning 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More