SOLR distributed searching (distributed search)

Last Update:2018-12-04 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source from: http://blog.chenlb.com/category/solr-search/page/4

SOLR distributed searching is a feature of SOLR 1.3. A large index may be divided into N small indexes for multiple reasons. A small index can be placed on other machines. But what if I don't have so many machines? SOLR 1.3 Has multicore. You can refer to my article for simple use of multicore. Each core is independent of each other and can be indexed independently (the indexes can be distributed to each core ).
Now let's take a look at the distributed searching effect. Open: http: // localhost: 8080/SOLR-cores/core0/select /? Q = * % 3A * & version = 2.2 & START = 0 & rows = 10 & indent = on & shards = localhost: 8080/SOLR-cores/core0, localhost: 8080/SOLR-cores/core1 you can see three records: core0 and core1.
Then there will be a problem: many programs used to call SOLR, such as localhost: 8080/SOLR-cores/select /? Q = * % 3A *. I don't want to change the original called code. Can I make it transparent? Explore...
Remember that some default query parameters can be defined when you read the SOLR document before, and the shards parameters should also be written in the configuration. Then add this parameter to the standard request handler in the solrconfig. xml file of core0, for example:

<Requesthandler name = "standard" class = "SOLR. searchhandler" default = "true">
<! -- Default values for query parameters -->
<Lst name = "defaults">
<STR name = "echoparams"> explicit </STR>
<STR name = "Shards"> localhost: 8080/SOLR-cores/core0, localhost: 8080/SOLR-cores/core1 </STR>
</Lst>
</Requesthandler>
Run immediately, but there is no result for a long time, CPU usage is very high, think deeply ,.... it is estimated that it is an infinite loop, because SOLR uses the default request handler when it calls shard after parsing shards. One Shard is core0 (itself), which means recursion that will not result. This method does not work. To avoid endless loops, do not merge with core0. You can find other items. So I added a tomcat instance, such as localhost: 8080/SOLR, as a proxy (merge result) and the result was successfully run.
If you want to find another tomcat instance, simply use a core to continue exploring ....

I name another core as core and copy core0 as core. Remove the shards of the original core0/solrconfig. xml configuration and add it to solr1.3/example/multicore/SOLR. XML, for example:
1. <? XML version = "1.0" encoding = "UTF-8"?>
2. <SOLR persistent = "false">
3.
4. <cores adminpath = "/admin/cores">
5. <core name = "core" instancedir = "core"/>
6. <core name = "core0" instancedir = "core0"/>
7. <core name = "core1" instancedir = "core1"/>
8. </cores>
9. </SOLR>
The result is successfully run. Among them, core0 and core1 have data, while core has no data, and core only runs merging. Although the problem can be solved satisfactorily.
But there is another problem: the original program needs to call SOLR, and all URLs cannot be changed. If core is added, the URL must be changed. When you look at the source code, you can find that it can define an alias for the core name, is separated. Change to the following:
1. <? XML version = "1.0" encoding = "UTF-8"?>
2. <SOLR persistent = "false">
3.
4. <cores adminpath = "/admin/cores">
5. <core name = "core," instancedir = "core"/>
6. <core name = "core0" instancedir = "core0"/>
7. <core name = "core1" instancedir = "core1"/>
8. </cores>
9. </SOLR>
"Core," Why are two ?. "Core," There are no alias for parsing. "Core,", parses two names, one is: "core, the other is:" "empty string. With an empty string, the original URL can reach the core (merged core ).

As for the issue of endless loops, my colleagues are looking at the source code to see if there is no need to add an additional core for merging. As a result, he found a shard. qt parameters can solve this problem. In essence, all shard calls do not use the default request handler or shard. qt can do this by adding the QT parameter to all shard calls.

Now, the final scheme is changed. Add a request handler in solrconfig. xml of core0 and core1, for example:
1. <requesthandler name = "shard" class = "SOLR. searchhandler"/>
Then add the shards parameter to the default request handler of solrconfig. XML in core0, and set shards. QT to shard (shard request handler), for example:
1. <requesthandler name = "standard" class = "SOLR. searchhandler" default = "true">
2. <! -- Default values for query parameters -->
3. <lst name = "defaults">
4. <STR name = "echoparams"> explain </STR>
5. <STR name = "shards. QT"> shard </STR>
6. <STR name = "Shards"> localhost: 8080/SOLR-cores/core0, localhost: 8080/SOLR-cores/core1 </STR>
7. </lst>
8. </requesthandler>
Then, add core0 with an empty alias in the core in SOLR. XML (without data removal), such:
1. <core name = "core0," instancedir = "core0"/>

Of course, you can add the same parameter to core1, so that core0 and core1 have the same function, that is, all data can be found in both search URLs. I think: each configuration is the same, it is more useful when the index is distributed to other machines (in this case, the form of multicore is not needed, that is, the original form). It cannot be seen that there are several indexes, at the same time, the merged tasks are even.

SOLR distributed searching will also consume, and the merged core will send two requests to each shard core: the first is to find the ID, and the second is to find the document based on the ID. If there are n Shard, we can think that there are 2n + 1 requests, and 1 is a merge Request, where 2n requests (sent by each shard) are communicated using binary protocol, the performance is better than the XML protocol.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More