Solr1.4.0 source code analysis II. correct URL Usage and principles in SOLR distributed search

Source: Internet
Author: User
Tags solr

Http://guoyunsky.iteye.com/blog/761308

I recently used SOLR for distributed search. I also collected some information from the Internet from the beginning. it is found that there is a misunderstanding of SOLR distributed search, which will lead to incorrect search results. for example, here I have two Shand:
1) http: // localhost: 8080/solr1.4/core0/
2) http: // localhost: 8080/solr1.4/core1/

I want to find the data with the top 30 ranking as 110, so I use the following URL:
1. http: // localhost: 8080/solr1.4/core0/select? Q = 110 & shards = localhost: 8080/solr1.4/core0, localhost: 8080/solr1.4/core1 & shards. rows = 30

But I found that there are only 10 results returned on the interface. Here I have set shards. Row = 30, but why do I only return 10 results? So I added a start parameter. The URL is as follows:
2. http: // localhost: 8080/solr1.4/core0/select? Q = 110 & shards = localhost: 8080/solr1.4/core0, localhost: 8080/solr1.4/core1 & shards. rows = 30 & shards. Start = 30
The result is changed, but the number is still 10. Then, the source code is tracked to find the cause. The URL should be set:

3. http: // localhost: 8080/solr1.4/core0/select? Q = 110 & shards = localhost: 8080/solr1.4/core0, localhost: 8080/solr1.4/core1 & shards. rows = 60 & START = 30 & rows = 30
Among them, 30 results are obtained from the top 30th, that is, the result of ranking 30 to 60. Set start and rows to start = 30 & rows = 30 instead of shards. start = 30 & shards. row = 30, and shard should be set to shards. start = 0 & shards. rows = 60 (shards. start = 0 can be omitted, and SOLR starts from 0 by default), its shards. rows = start + rows.

The preceding settings are used for distributed search to obtain data with a ranking of 30-60. The following describes the principles:

SOLR constructs shardfieldsortedhitqueue to collect the results from various shard queries. This class inherits Lucene's priorityqueue (I have simulated this class method. For details, refer to my blog:
Http://guoyunsky.iteye.com/blog/723963), you need to specify the sort field (sortfield) and size. when solrfield and size are specified, SOLR has its own sortspec class to manage. SOLR constructs sortspec through the start and rows parameters during the querycomponent initialization of the query component. if the two parameters are not included in the URL, SOLR uses the default value, that is, start = 0 & rows = 10. As a result, I have only 10 results at the beginning.
There is another more serious problem. If I need to obtain data in the ranking 30-60 from multiple Shard, we must obtain the top 60 data in each shard and then merge the shard data, find the top 60 shard data. then, get 30 records from the end of the result, that is, the data ranked 30-60. so if you specify shards as I did at the beginning. row = 30, the top 30 data of each shard is obtained, not 60. so add
The result obtained by shards. Start = 30 is different from the first one, because it is the top 60 data for each shard.

However, it is found that there are still some areas of concern in SOLR. For example, you can set <queryresultwindowsize> 20 </queryresultwindowsize> In solrconfig. XML to indicate the number of display results. Therefore, SOLR should use this default value.
Instead of its own definition 10, it also includes the SOLR web management interface, where the rows are set 10.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.