Clear Solr Nodes, cores, Clusters and leaders, shards and indexing Data

Last Update:2015-07-28 Source: Internet

Author: User

Tags failover solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Https://cwiki.apache.org/confluence/display/solr/Nodes%2C+Cores%2C+Clusters+and+LeadersNodes and Cores

In Solrcloud, a node was Java Virtual machine instance running SOLR, commonly called a server. Each SOLR core can also is considered a node. Any node can contain both an instance of SOLR and various kinds of data.

A SOLR Core is basically an index of the text and fields found in documents. A single SOLR instance can contain multiple "cores", which is separate from the other based on local criteria. It might be, they is going to provide different search interfaces to users (customers in the US and customers in Cana DA, for example), or they has security concerns (some users cannot has access to some documents), or the documents is R Eally different and just won ' t mix well in the same index (a shoe database and a DVD database).

When you start a new core in Solrcloud mode, it registers itself with ZooKeeper. This involves creating a ephemeral node that would go away if the SOLR instance goes down, as well as registering Informat Ion about the core and how to contact it (such as the base SOLR URL, core name, etc). Smart clients and nodes in the cluster can use this information to determine who they need to talk to in order to fulfill A request.

New SOLR cores may also is created and associated with a collection via Coreadmin. Additional cloud-related parameters is discussed in the Parameter Reference page. Terms used for the CREATE action is:

Collection: The name of the collection to which this core belongs. Default is the name of the core.
Shard: The Shard ID this core represents. (Optional:normally want to be auto assigned a shard ID.)
collection.<param>=<value>: Causes a property of to being <param>=<value> set if a new collection is being created . For example, use-to-point-to-the- collection.configName=<configname> config for a new collection.

For example:

curl ‘http://localhost:8983/solr/admin/cores? action=CREATE&name=mycore&collection=my_collection&shard=shard2‘

Clusters

A cluster is the set of SOLR nodes managed by ZooKeeper as a single unit. If you had a cluster, you can always make requests to the cluster and if the request was acknowledged, you can be sure t Hat it'll be managed as a unit and is durable, i.e., you won ' t lose data. Updates can seen right after they is made and the cluster can be expanded or contracted.

Creating a Cluster

A cluster is created as soon as and more than one SOLR instance registered with ZooKeeper. The section Getting Started and Solrcloud reviews How to set up a simple cluster.

Resizing a Cluster

Clusters contain a settable number of shards. You set the number of shards to a new cluster by passing a system property, numShards and when you start to SOLR. numShardsThe parameter must be passed on the first startup of any SOLR node, and is used to auto-assign which shard each inst Ance should is part of. Once you has started up to more SOLR nodes than numShards , the nodes would create replicas for each shard, distributing them even Ly across the node, as long as they all belong to the same collection.

To add a cores to your collection, simply start the new core. You can do this at any time and the new core would sync its data with the current replicas in the Shard before becoming act Ive.

You can also avoid and numShards manually assign a core a shard ID if you choose.

The number of shards determines how the data in your index are broken up, so you cannot the number of shards of the Index after initially setting up the cluster.

However, do not have the option of breaking your index into multiple shards to start with, even if you is only using a Si Ngle Machine. You can then expand to multiple machines later. To do, follow these steps:

Set up your collection by hosting multiple cores to a single physical machine (or group of machines). Each of these shards is a leader for that Shard.
When your ' re ready, you can migrate shards onto new machines by starting up a new replica for a given shard on each new Mac Hine.
Remove The Shard from the original machine. ZooKeeper would promote the replica to the leader for that Shard.

Leaders and replicas

The concept of a leader is similar to that of master when thinking of traditional SOLR replication. The leader is responsible for making sure the replicas be up to date with the same information stored in the LEA Der.

However, with Solrcloud, you don ' t simply has one master and one or more "slaves", instead you likely has distributed yo ur search and index traffic to multiple machines. If you had bootstrapped SOLR with numShards=2 , for example, your indexes is split across both shards. In this case, both shards is considered leaders. If you start more SOLR nodes after the initial, these'll be automatically assigned as replicas for the leaders.

Replicas is assigned to shards in the order they is started the first time they join the cluster. Round-robin manner, unless the new node is manually assigned to a shard with the shardId parameter during Startup. This parameter was used as a system property, as -DshardId=1 in, the value of which is the ID number of the Shard the new node sh Ould is attached to.

On subsequent restarts, each node joins the same shard the It is assigned to the first time of the node was started (whethe R that assignment happened manually or automatically). A node that's previously a replica, however, may become the leader if the previously assigned leader are not available.

Consider this example:

Node A is started with the bootstrap parameters, pointing to A stand-alone ZooKeeper, and the numShards parameter set to 2.
Node B is started and pointed to the stand-alone ZooKeeper.

Nodes A and B are both shards, and having fulfilled the 2 shard slots we defined when we started Node A. If We look in the SOLR Admin UI, we'll see that both nodes is considered leaders (indicated with a solid blank circle).

Node C is started and pointed to the stand-alone ZooKeeper.

Node C would automatically become a replica of Node A because we didn ' t specify any other shard for it to belong to, and it Cannot become a new shard because we only defined the shards and those have both been taken.

Node D is started and pointed to the stand-alone ZooKeeper.

Node D would automatically become a replica of node B, for the same reasons what node C is a replica of Node A.

Upon restart, suppose that node C starts before Node A. What happens? Node C would become the leader, while node A becomes a replica of node C.

When your data was too large for one node, you can broke it up and the store it in sections by creating one or more shards< /c0>. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index.

A shard is a-splitting a core over a number of "servers", or nodes. For example, might has a shard for the data that represents each state, or different categories that is likely to be sea Rched independently, but is often combined.

Before Solrcloud, SOLR supported distributed Search, which allowed one query to being executed across multiple shards, so the Query was executed against the entire SOLR index and no documents would is missed from the search results. So splitting the core across shards are not exclusively a solrcloud concept. There were, however, several problems with the distributed approach this necessitated improvement with Solrcloud:

Splitting of the core into shards is somewhat manual.
There is no support for distributed indexing, which meant the needed to explicitly send documents to a specific Shar D SOLR couldn ' t figure out in its own what shards to send documents to.
There is no load balancing or failover, so if you got a high number of queries, you needed to the figure out where to send th Em and if one shard died it was just gone.

Solrcloud fixes all those problems. There is support for distributing both the index process and the queries automatically, and ZooKeeper provides failover an D load Balancing. Additionally, every Shard can also has multiple replicas for additional robustness.

In Solrcloud there is no masters or slaves. Instead, there is leaders and replicas. Leaders is automatically elected, initially on a first-come-first-served basis, and then based on the Zookeeper process D Escribed at Http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection.

If a leader goes down, one of its replicas is automatically elected as the new leader. As each node is started, it's assigned to the Shard with the fewest replicas. When there ' s a tie, it's assigned to the Shard with the lowest shard ID.

When a document was sent to a machine for indexing, the system first determines if the machine was a replica or a leader.

If the machine is a replica, the document was forwarded to the leader for processing.
If The machine was a leader, Solrcloud determines which shard the document should go to, forwards the document the leader F Or that shard, indexes the document for this shard, and forwards the index notation to itself and any replicas.

Document Routing

SOLR offers the ability to specify the router implementation used by a collection by specifying the router.name parameter when creating your collection. If you use the ' compositeid ' router, you can send documents with a prefix in the document ID which would be us Ed to calculate the hash SOLR uses to determine the Shard a document was sent to for indexing. The prefix can is anything you ' d like it to be (it doesn ' t has to be the Shard name, for example), but it must is consist ENT so SOLR behaves consistently. For example, if you wanted to co-locate documents for a customer, you could use the customer name or ID as the prefix. If your customer is ' IBM ', for example, with a document with the ID ' 12345 ', you would insert the prefix into the document ID field: "ibm!12345". The exclamation mark ('! ') is critical here, as it distinguishes the prefix used to determine which shard to direct the Do Cument to.

Then at query time, you include the prefix (es) into your query with _route_ the parameter (i.e., q=solr&_route_=IBM! ) to direct queries to Specific shards. In some situations, this could improve query performance because it overcomes network latency when querying all the shards.

_route_The parameter replaces shard.keys , which has been deprecated and would be removed in a future SOLR release.

The compositeId router supports prefixes containing up to 2 levels of routing. For EXAMPLE:A prefix routing first by region, then by customer: "usa! ibm!12345 "

Another use case could is if the customer "IBM" has a lot of documents and your want to spread it across multiple. The syntax for such a use case would being: "shard_key/num!document_id" where The/num is the number of bits from the Shard Key to use the composite hash.

So "ibm/3!12345" would take 3 bits from the Shard key and the bits from the unique doc ID, spreading the tenant over 1/8th o f the shards in the collection. Likewise if the NUM value was 2 it would spread the documents across 1/4th the number of shards. at query time, you include the prefix (es) along with the number of bits to your query _route_ with the parameter (i.e., q=solr&_route_=IBM/3! ) to direct queries to specific shards.

If you don't want to influence how documents is stored, you don't need to specify a prefix in your document ID.

If you created the collection and defined the ' implicit ' router at the time of creation, you can additionally define a parameter to use a field from each document to identify a shard where the document belongs. If The field specified is missing in the document, however, the document would be rejected. You could also use the _route_ parameter to name a specific shard.

Shard splitting

When you create a collection in Solrcloud, you decide on the initial number shards to be used. But it can is difficult to know in advance the number of shards so you need, particularly when organizational Requiremen TS can change at a moment's notice, and the cost of finding out later so you chose wrong can is high, involving creating New cores and re-indexing all of your data.

The ability to split shards are in the collections API. It currently allows splitting a shard into the pieces. The existing Shard is a left as-is and so the split action effectively makes and copies of the data as new shards. You can delete the old shard at a later time if you are ready.

More details on what to use Shard splitting are in the sections on the collections API.

Ignoring commits from Client applications in Solrcloud

In the most cases, when running in Solrcloud mode, indexing client applications should not send explicit commit requests. Rather, should configure auto commits withopenSearcher=falseand auto Soft-commits to make recent updates visible in search requests. This ensures, the auto commits occur on a regular schedule in the cluster. To enforce a policy where client applications should not send explicit commits, you should update all client applications That's index data into Solrcloud. However, that's not always feasible, so SOLR provides theIgnorecommitoptimizeupdateprocessorfactory, which allows you to ignore explicit commits and/or optimize requests From client applications without has refactor your client application code. To activate this request processor you ' ll need to add the following to your solrconfig.xml:

<updateRequestProcessorChain name="ignore-commit-from-client" default="true"> <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory"> <int name="statusCode">200</int> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.DistributedUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /></updateRequestProcessorChain>

As shown in the example above, the processor would return to the client but would ignore the commit/optimize request. Notice the need to wire-in the implicit processors needed by Solrcloud as well, since this custom chain is taking the Place of the default chain.

In the following example, the processor would raise an exception with a 403 code with a customized error message:

<updateRequestProcessorChain name="ignore-commit-from-client" default="true"> <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory"> <int name="statusCode">403</int> <str name="responseMessage">Thou shall not issue a commit!</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.DistributedUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /></updateRequestProcessorChain>

Lastly, you can also configure it to just ignore optimize and let commits pass thru by doing:

<updateRequestProcessorChain name="ignore-optimize-only-from-client-403"> <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory"> <str name="responseMessage">Thou shall not issue an optimize, but commits are OK!</str> <bool name="ignoreOptimizeOnly">true</bool> </processor> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>

Clear Solr Nodes, cores, Clusters and leaders, shards and indexing Data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More