SOLR distributed search source code analysis

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The distributed search Master Logic is implemented in the searchhandler. handlerequestbody method. For details, see distributed request branch.
The distributed search process is divided into stages. Stage control is implemented in the distributedprocess method of each componnet. The request output of each stage is encapsulated in outgoing. Add (sreq.
Component sets and processes parameters for each stage and outputs them to outgoing. If outgoing has a value detected in the while loop, this indicates that the request is required, distributed call;
Distributed invocation submits the query of each shard for the current stage and asynchronously collects the returned results of all shard.
Process the results after each stage is computed.
For (searchcomponent C: Components ){
C. handleresponses (RB, srsp. getshardrequest ());
}

After all stages are completed, call component. finishstage for subsequent processing.
For (searchcomponent C: Components ){
C. finishstage (RB );
}

Stage is divided into two stages: makeQuery and getFileds.
MakeQuery: adds the query parameter fl = id to the url, and whether the score will be included based on the situation. We know that fl indicates only those fields are returned, and Solr will retrieve the uniquerField of the schema configuration file, therefore, this request only returns the id value. After the id value is obtained, QueryComponent merges IDs. If different shard has the same id, only one
GetFileds encapsulates the request parameters through QueryComponent. The most important thing is to encapsulate the ids parameter, that is, put it as a value in the url according to the ids parameter obtained in the preceding request, send another request to obtain the corresponding field based on the id.
In fact, before these two stages, another stage is STAGE_PARSE_QUERY. In this stage, the distributed idf can be calculated. However, solr is not implemented, but by default, each shard only calculates its own idf, does not calculate the global idf, in the case of a large amount of data, based on the Shard level of TF-IDF will not have too much deviation, but if the distributed index is very uneven, it may be necessary to pay attention to the issue of relevance computing.

After the makeQuery stage is complete, perform the mergeIds operation on all docs returned by shards. In the mergeIds implementation, the doc will be placed in the priority queue based on the page and sorted by comparison between sort and score, obtain the doc list on the current page. The priority queue size is start + rows. Only the size of rows is returned to the client, however, you need to sort the top start + rows doc quantity. If there are many pages, the memory overhead of the proxy node and the CPU overhead of sorting calculation will be relatively large.

For example
QueryComponent,
Private void handleregularresponses (responsebuilder RB, shardrequest sreq ){
If (sreq. Purpose & shardrequest. purpose_get_top_ids )! = 0 ){
Mergeids (RB, sreq );
}

If (sreq. Purpose & shardrequest. purpose_get_fields )! = 0 ){
Returnfields (RB, sreq );
}
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SOLR distributed search source code analysis

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support