SOLR's way of creating fragmentation. __elasticsearch

Source: Internet
Author: User
Tags curl solr
After Solr4.4, SOLR provides a model for the Solrcloud distributed cluster, and its main benefits are:

(1) Higher performance under large data volume
(2) Better scalability
(3) Higher reliability
(4) More simple and easy to use


When should you use Solrcloud (Shard)?

(1) Larger amount of data
(2) Larger index volume
(3) Want to parallel index and query
(4) Want to customize the data partition


Classification of Solrcloud Routing

A: Explicit routing (composite) => explicitly specifies the number of shard when it is created, cannot add or remove Shard at the later stage, single Shard can split

Create a collection: first create recommendations in advance on Linux to the Conf configuration uploaded to ZK, and then in the SOLR admin to create, in addition, also support the Curl and Java API dynamic creation

Unique skills: Support Shard split, shard dynamic additions and deletions are not supported

Characteristics:
(1) The default is based on the hash (DOCID) to locate the attribution shard
(2) Also support on the primary key to join the routing of the field, up to 2 levels, the query to add _route_ parameters, set the routing strategy, use cases as follows:
First-level Routing example:
China!1 usa!2
Second-level routing examples:
China!beijing!1 usa!nework!2
If the data is unevenly distributed after routing, the data is skewed to a balanced route, as in the following example:
China/3!1 by 1/8 shard number, balanced data, if Shard Total is 24, then there will be 3 Shard store the data of the
China!henan/2!1 by 1/4 shard number, balanced data, if the total number of Shard is 24, then there will be 6 shard storage China!henan data is currently fixed only 1/8 and 1/4 of the proportional allocation, that is, only to support the distribution of 2,3 such proportions

B: Implicit routing (implict) => explicitly specifies the number of shard when it is created, and can dynamically add or remove Shard at the later stage, but a single shard cannot split

Create a collection: first create a recommendation in advance on Linux to the Conf configuration uploaded to ZK, and then in the SOLR admin to create, in addition to support the Curl and Java API dynamic creation.

Unique skills: Shard Split is not supported, Shard dynamic additions and deletions are supported

Characteristics:
True 100% of manual routing, according to the business Rules for Shard settings, in addition, support shard dynamic Add and delete, routing control arbitrary, unlike MySQL also rely on middleware to get it, query to add _route_ parameters, set routing strategy


Summarize:
The article briefly introduces the benefits of using Solrcloud, and when should be used and the type and characteristics of its routing, the routing topic in the distributed system is a high-level skill, it and divide and conquer, on-demand thought of the sharding function is not SOLR, Elasticsearch unique, any database storage system can exist this concept, in the actual development of the application scenario, according to the specific characteristics of the business needs to properly partition the dimensions and routing, use properly, efficiency will be greatly enhanced, whether it is written or query, in consideration of design dimensions or routing, Most of the time can be considered, categories, grades, regions and other fields, of course, specific business needs specific analysis, can not generalize.



Official website Documents:
(1) Document routing
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud# Shardsandindexingdatainsolrcloud-documentrouting
(2) Collection API
Https://cwiki.apache.org/confluence/display/solr/Collections+API

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.