12. SolrCloud principle and solrcloudshard Principle
Reprinted from http://blog.csdn.net/u011026968/article/details/50336709
It covers the principles of SolrCloud, including basic knowledge, architecture, index creation and update, query, fault recovery, Server Load balancer, and leader election.
1. Relationship between SolrCloud, Solr, and lucene
1,SolrAndLuenceRelationship
There is a metaphor on the Internet:
(1) If lucene is a database, solr is jdbc.
(2) lucene is a jar, and solr is a search client written by referencing these jar files. Solr is an application that can be used directly, while lucene is only a library for programming.
2,SolrAndSolrCloud
SolrCloud is a pioneering distributed search solution based on Solr and Zookeeper developed by Solr4.0. Alternatively, SolrCloud is a deployment method of Solr. Solr can be deployed in multiple ways, such as standalone mode and multi-host Master-Slaver mode. Solr deployed in these ways does not have the special features of SolrCloud.
Ii. SolrCloud Configuration
There are two methods:
(1) Deploy solr to the Tomcat container as a web application and associate Tomcat with zookeeper.
(2) from solr 5, solr itself has jetty containers and does not need to be deployed to Tomcat, which is easier.
You can refer to the following Tutorial:
Http://blog.csdn.net/wanghui2008123/article/details/37813525
III,SolrCloudBasic knowledge
1. Concept
·Collection: Complete index in the logic sense of SolrCloud cluster. It is often divided into one or more Shard, and they use the same Config Set. If there is more than one Shard, It is a distributed index. SolrCloud allows you to reference it by Collection name without worrying about the Shard-related parameters required for Distributed Retrieval.
·ConfigSet: A set of configuration files required by Solr Core to provide services. Each config set has a name. The minimum requirements include solrconfig. xml (SolrConfigXml) and schema. xml (SchemaXml). In addition, other files may need to be included based on the configuration content of these two files. It is stored in Zookeeper. Config sets can be uploaded again or updated using the upconfig command. Use the Solr startup parameter bootstrap_confdir to specify whether it can be initialized or updated.
·Core: Solr Core. A Solr contains one or more Solr cores. Each Solr Core can independently provide indexing and query functions. Each Solr Core corresponds to one index or Collection Shard, solr Core is proposed to increase management flexibility and share resources. The difference in SolrCloud is that it uses the configuration in Zookeeper, and the traditional Solr core configuration file is in the configuration directory on the disk.
·Leader: The Shard replicas that won the election. Each Shard has multiple Replicas. These Replicas need to be elected to determine a Leader. Elections can occur at any time, but they are usually triggered only when a Solr instance fails. When you index events, SolrCloud will pass the corresponding leaders to the Shard, and the leaders will then distribute them to all Shard replicas.
·Replica: A copy of Shard. Each Replica exists in a Solr Core. A collection named "test" is created with numShards = 1 and the replicationFactor is set to 2. Two replicas are generated, that is, two cores are generated, each instance is on a different machine or Solr instance. One is named test_shard1_replica1, and the other is named test_shard1_replica2. One of them will be elected as the Leader.
·Shard: The logical part of the Collection. Each Shard is converted into one or more replicas, and the Leader is determined by election.
·Zookeeper: Zookeeper provides the Distributed Lock function, which is required for SolrCloud. It handles Leader elections. Solr can run with built-in Zookeeper, but we recommend that you use an independent server with more than three hosts.
·SolrCore: When running on a single machine, a separate index is called SolrCore. If Multiple indexes are created, multiple solrcores can be created.
·Index: An index can be used on different Solr services, that is, an index can be composed of SolrCore on different machines. SolrCore on different machines makes up logical indexes. Such an index is called Collection. The SolrCore that makes up the Collection includes data indexes and backups.
·SolrCloud collection shardLink: A SolrCloud contains multiple collections, which can be divided into Multiple shards. Each shard can have multiple backups (replicas). These backups generate a leader by election,
·Optimization: A process that compresses indexes and merges segments. Optimization runs only on the master node,
·LeaderAndReplica
(1) The leader is responsible for ensuring that the replica and leader are the same up-to-date information.
(2) replica is allocated to shard in the order of Round Robin in the first startup order of the cluster, unless the new node is manually assigned to shard using the shardId parameter. Manually specify the shard method of replica:-DshardId = 1
During subsequent restart, each node will be added to the shard it was previously assigned for the first time. If the Node is a replica Node, it becomes a leader when the previous leader does not exist.
2. Architecture
·Index (collection).
·Index and SolrEntity comparison chart
A classic example:
SolrCloud is a distributed search solution based on Solr and Zookeeper. It is one of the core components of Solr4.0 under development. Its main idea is to use Zookeeper as the configuration information center of the cluster. It has several features: 1) centralized configuration information 2) Automatic Fault Tolerance 3) near real-time search 4) Automatic Load Balancing during Query
The preceding figure shows a cluster with four Solr nodes. The indexes are distributed in two Shard nodes. Each Shard contains two Solr nodes and the other is a Leader node, one is a Replica node, and the cluster has an Overseer node responsible for maintaining the cluster status information. It is a total controller. All status information of the cluster is maintained in the Zookeeper cluster. You can also see that any node can receive the Index Update request, and then forward the request to the Leader node of the Shard that the document should belong to. The Leader node is updated completely, finally, the version number and document are forwarded to the same Shard replicas node.
3.Several roles
(1) zookeeper: The following information is stored on zookeeper. The zookeeper directory is called znode.
SolrInZookeeperNode in 1. aliases. json is an alias for colletion. It is also useful (solrcloud build search is separated). I will write a blog description later.
2. clusterstate. json important information file. Contains the specific descriptions of colletion, shard, and replica. 3. The live_nodes nodes are transient zk nodes, representing the nodes in solrcloud that are currently alive. 4. Important Roles in overseer solrcloud. There are three important distributed queues, which represent the task queues for which solrcloud-related zookeeper operations are to be executed. 4.1 collection-queue-work stores special operations related to collection, such as createcollection, reloadcollection, createalias, deletealias, and splitshard. 4.2 queue stores all the operations unrelated to the collection, such as deletecore, removecollection, removeshard, leader, createshard, updateshardstate, and changes in the node state (down, active, recovering. 4.3 queue-work is a temporary queue that refers to the message being processed. The operation is saved to/overseer/queue first, and moved to/overseer/queue-work during overseser processing, after processing, the message is deleted from/overseer/queue-work. If the overseer goes down halfway, the newly elected overseer will choose to finish the operations in/overseer/queue-work and then process the operations in/overseer/queue. Note: all the sub-nodes in the above queue are of the PERSISTENT_SEQUENTIAL type. 5. overseer_elect for overseer election 6. colletcion: stores some simple information about the current collection (the main information is in clusterstate. json ). The following leader_elect is used for the leader Election of the replica set in the shard in the collection. |
(2) overseer: overseer is a role that is often ignored. In fact, I tested that every time I add a new machine, SolrCloud has one more Solr and, there will be one more oveseer (of course it may not work ). There is only one overseer in SolrCloud, and all Overseers are elected to generate overseer. The leader election method for Overseer and shard is the same. For details, see the leader election section.
OverseerOfZkWrite Process When reading solrcloud's official documents, there are very few descriptions about the role of overseer. I believe many developers who have successfully configured solrcloud do not realize the existence of this role. Overseer, as its name implies, is a global-looking role for overall control. It is reflected in the operations related to code and zk, that is, most write operations in zookeeper are handled by overseer and the clusterstate is maintained. josn and aliases. json content of the two zk nodes. It is different from our practice of "who creates and Who modifies. Operations initiated by each solr node will be publish to the corresponding queue under the/overseer node, and then overseer to the distributed queue to obtain the operation information and make corresponding zk modifications, update the specific state information related to solrcloud to cluseterstate. in json, the operation will be deleted from the queue, indicating that the operation is completed. Take a solr node as an example to mark its status as down. This node will publish the information related to this "state" operation to/overseer/queue. This operation is obtained by the Overseer, And the node state is down and written to clusterstate. json. Finally, delete the node in the queue. Of course, the overseer role is elected by zookeeper in solrcloud. AverageZkRead operations Solr places the most important and comprehensive information in cluseterstate. json. This reduces the number of zk knots that ordinary solr nodes need to pay attention. Except for clusterstate. json. When the general solr node needs the overall status of the current collection, it will also obtain information in zk's/live_nodes. According to the information in live_nodes, it will know the node that the collection is alive, and then from clusterstate. json to obtain the node information. This kind of processing is actually understandable. If a solr node is out of service abnormally, clusterstate. json may not change, but the zk node corresponding to this node in/live_nodes disappears (because it is instantaneous ). |
4. index creation process (Index Update)
1Details:
(1) SolrJ is the client, and CloudSolrServcer is the SolrCloud machine.
(2) Document ID
According to the source code, the Document ID is generated. You can configure the document by yourself and use the default configuration, in this case, the Document ID is a java UUID generator (this ID generator can generate a globally unique ID)
(3) watch
The client (watcher) sets watch on the data of zookeeper. watch is a one-time trigger, which is triggered when the data changes. watch sends information to the client that sets the Watch.
(4) lucene index is a directory. The insert or delete operations of indexes remain unchanged. Documents are always inserted into newly created files. Deleted documents are not actually deleted from files, instead, it is only tagged until the index is optimized. Update is to add and delete a combination.
(5) document routing
Solr4.1 adds the document clustering function to improve the query performance.
Solr4.5 adds a router. name parameter to specify the functions implemented by a specific router. If you use a "compositeId" router, you can add a prefix before the ID of the document to be sent to Solr for indexing. This prefix will be used to calculate a hash value, solr uses this hash value to determine the shard to which the document is sent for indexing. There is no limit on the value of this prefix (for example, there is no need to be the shard name), but it must always be consistent to ensure that the Solr execution results are consistent. For example, if you need to cluster documents for different customers, you may use the customer's name or ID as a prefix. For example, if your customer is "IBM", if you have a document ID of "12345", insert the prefix into the Document id field and change it to "IBM! 12345 ", here the exclamation point is a separator, Here" IBM "defines this document will point to a specific shard.
Then, when querying, You need to include this prefix in your _ route _ parameter (for example, q = solr & _ route _ = IBM !) Directs the query to the specified shard. In some cases, this operation can improve the query performance because it saves the network transmission time required to query on all shard instances. Replace shard. keys with _ route. The shard. keys parameter is out of date and will be removed in future Solr versions. If you do not want to change the stored procedure of a document, you do not need to add a prefix before the Document ID. If you have created a collection and specified an "implicit" router during creation, you can define another router. field parameter, which defines the shard to which the document belongs by using a field in the document. However, if the field you specified in a document is not worth it, Solr will reject it. You can also use the _ route _ parameter to specify a specific shard.
My understanding: the benefit of adding such a clustering: When querying, declare route = IBM, then you can reduce the access shard.
The value of router. name can be implicit or compositeId. Implicit does not automatically route documents to different shard, but will follow what you imply in indexingrequest. That is, when creating an index, if it is implicit, You need to specify the shard to which the document belongs. For example:
Curl "http: // 10.1.30.220: 8081/solr/admin/collections? Action = CREATE & name = paper & collection. configName = paper_conf & router. name = implicit & shards = shard1, shard2 & createNodeSet = 10.1.30.220: 8081_solr, 10.1.30.220: 8084_solr" |
2Specific Process
Document addition Process:
(1) When SolrJ sends an update request to CloudSolrServer, CloudSolrServer connects to Zookeeper to obtain the cluster status of the current SolrCloud, and will be in/clusterstate. json and/live_nodes register watcher to facilitate monitoring of Zookeeper and SolrCloud. the following benefits are provided:
After obtaining the SolrCloud status, CloudSolrServer can directly send the document to the SolrCloud leader to reduce network forwarding consumption.
Registering watcher is conducive to load balancing during indexing. For example, if a node leader is offline, CloudSolrServer immediately learns that it will stop sending document to the leader.
(2) route the document to the correct shard. CloudSolrServer needs to know the shard to which the document is sent. However, it should be noted that the route for a single document is very simple, but SolrCloud supports batch add, that is, under normal circumstances, N documents are routed simultaneously. At this time, SolrCloud will store documents separately based on the route direction of the document for classification, and then send the documents concurrently to the corresponding shard, which requires high concurrency capabilities.
(3) After the Leader receives the update request, it first stores the update information to the local update log. At the same time, the Leader will allocate a new version to the RNT. For the existing document, the Leader verifies the assigned new version and existing version. If the new version is higher, the old version is discarded and finally sent to replica.
(4) when there is only one Replica, replica enters the recovering status and waits for the leader to go online again for a period of time. If the leader is not going online during this period, replica is converted into a leader with some document losses.
(5) the final step is commit. There are two types of commit: softcommit, that is, the segment is generated in the memory, and the document is visible (which can be queried) but it is not written to the disk, and data is lost after power failure. The other is hardcommit, which directly writes data to the disk and the data is visible. The former consumes less, and the latter consumes more.
A ulog Update log is generated every time the service is committed. When the server fails and the memory data is lost, it can be recovered from the ulog.
5. Query
NRT near real-time search
SolrCloud supports near-real-time search. The so-called near-real-time search allows the document to be visible and queryable in a short period of time. This is mainly based on the softcommit mechanism (Lucene does not have softcommit, but only hardcommit ).
When you perform SoftCommit, Solr opens a new Searcher to make the new document visible. Solr also performs cache push and query to make the cached data visible. Therefore, you must ensure that the cache push and the query push time must be shorter than the commit frequency. Otherwise, the commit failure will occur because too many searchers are opened.
Finally, let's talk about the feeling of near-real-time search at work. near-real-time search is relative. For some customers, one minute is nearly real-time, and three minutes is nearly real-time. For Solr, softcommit is more frequent and more real-time, while the more softcommit is, the greater the load of Solr (the more frequent the commit is, the more small and more segments will be generated, so merge appears more frequently ). At present, our company's softcommit frequency is 3 minutes. Previously we set it to 1 minute, so that Solr occupies too many resources in Index, which greatly affects queries. So near real-time is bothering us, because customers will keep asking you to be more real-time. Currently, the company uses the cache mechanism to make up for this real-time performance.
6. shardsptasks
The following configuration parameters are available:
· Path: the path stored after the core0 index is finally split. It supports multiple paths, such as cores? Action = SPLIT & core = core0 & path = path1 & path = path2.
· TargetCore: divides core0 indexes and puts them into targetCore (targetCore must have been created). Multiple parameters are also supported. Note that there must be at least one parameter for path and targetCore.
· Split. key, which is split according to the key. The default value is unique_id.
· Ranges, hash interval, which is evenly divided by the number of shards by default.
· It can be seen that the Core Split api is a lower-layer interface, which can divide a core into any number of indexes (or cores)
VII. Server Load balancer
You must perform the query on your own. As for the shard to which the document is put, it is done according to the id. If it is configured with route. name = implicit, you can specify the shard you want to go.
VIII. fault recovery
1Fault recovery
Recovering can be performed in several situations:
(1) There is an offline replica
When the index is updated, the offline replica is not taken into account. When the index is released, a rediscovery process will reply to them. If the forwarded replica is in the recovering status, then this replica will put update into the update transaction log.
(2) If shard (I think) has only one replica
When there is only one Replica, replica will enter the recoveing status and wait for the leader to go online again for a while. If the leader is not going online during this period, replica is converted into a leader with some document losses.
(3) When SolrCloud performs update, the leader fails to forward the update to replica for some reason, which forces replica to perform recoverying for data synchronization.
2,RecoveryPolicy
Let's talk about the Recovery strategy in the third method above:
(1) Peer sync. If the interrupted time is short and the recovering node only loses a small number of update requests, it can be obtained from the update log of the leader. The critical value is 100 update requests. If the value is greater than 100, the entire index snapshot is restored from the leader.
(2) Snapshot replication: If the node is offline for too long and cannot be synchronized from the leader, it will use solr's http-based Snapshot for index restoration. When you add a new replica to the shard, it performs a complete index Snapshot.
3The specific process of the two policies
(1) Overall process
Solr sends getVersion requests to all Replica to obtain and sort the latest nupdate versions (100 by default. Obtain the 100 versions of the current shard.
Compare the version of replica and replica to see if there is an intersection:
A) if there is an intersection, update Peer sync partially (in document Unit)
B) if there is no intersection, it indicates that the gap is too large, replication (in the smallest unit of files)
(2) Specific replication Process
(A) When starting Replication, perform a commitOnLeader operation, that is, send the commit command to the leader. It is used to fl the data in the update of the leader to the index file to complete the snapshot snap.
(B) After Various judgments, download the index data for replication
(C) during replication, the shard state is recoverying. Shards can be indexed but cannot be queried. During synchronization, new data enters ulog, however, from the source code, the data will not enter the index file. After the synchronization replication is complete, the replay process is performed, which re-executes the requests in ulog. In this way, you can write all you missed before.
4And fault tolerance
(1) read
Each search request is executed by all shards of a collection. If some shard does not return results, the query fails. At this time, the shards. tolerant Parameter is configured. If it is true, some results will be returned.
(2) Write
Changes to the organization and content of each node are written into the Transaction log. The log is used to determine which content of the node should be included in the replica. When a new replica is created, use the leader and Transaction log to determine which content should be included. At the same time, this log can also be used for recovery. TransactionLog is composed of a record that stores a series of update operations. It increases the robustness of index operations, as long as a node is accidentally interrupted during index operations, it can redo all unsubmitted update operations.
If a leader node goes down, it may have sent requests to some replica nodes but not others. Therefore, before a new leader node is elected, it will rely on other replica nodes to run a synchronization operation. If this operation succeeds, the data on all nodes will be consistent. Then, the leader node registers itself as an active node, and normal operations will be processed. If the data of a replica node is too many to be synchronized from the whole, the system will request a full replication-based recovery.
The overseer of the cluster monitors the leader nodes of each shard. If the leader node fails, the automatic fault tolerance mechanism is enabled, A leader node will be re-elected from other replica nodes in the same shard. Even if the overseer node crashes, the new overseer node will be automatically enabled on other nodes, this ensures the high availability of the cluster.
IX. election strategy
SolrCloud does not have a master or slave. Leader automatically elected, initially following the first-come-first-serve
Then follow the zk election method.
Http://zookeeper.apache.org/doc/trunk/recipes.html# SC _leaderElection
Zk election method: zk gives each server an id. the id of the new machine is greater than the previous one. If the leader goes down, all applications will look at the current minimum number, then, we can see that each follower sets a watch for the node (that is, the node with the largest serial number than its own node number) in the follower cluster. Only when the watch set by follower is triggered will it perform the Leader election. Generally, it will become the next Leader in the cluster. Obviously, this Leader election operation is fast. Because each Leader election involves almost only one follower operation.