Xapian Study Notes 3 sorting of related fields

Source: Internet
Author: User
Xapian Study Notes 3 sorting of related fields

In xapina, hit documents are sorted in descending order of relevance of documents. When the two documents have the same relevance, they are sorted in ascending order of document IDs. You can also set enquire. set_docid_order (enquire. descending) to turn it into a descending order, or set it to an enquire that does not care about the Document ID. set_docid_order (enquire. dont_care); of course, this sorting can also be done by other rules, or by combining other rules with relevance. Next we will analyze the two sorting methods.

1. Relevance sorting

Xapian is based on bm25 by default.AlgorithmTo calculate the score of each document, while some of the parameter values required for the calculation of bm25 (http://xapian.org/docs/bm25.html) are calculated by the document index, of course, some parameters can be dynamically set during query.
In addition, some additional bonus points include tradweight and boolweight. Tradweight implements the original probability model formula, which is a special case of the bm25 algorithm, but some parameters such as k2 = 0, K3 = 0 and min_normlen = 0. The boolweight scorecard scores 0 for all documents, and the Document Sorting is determined by other factors.

Of course, you can also customize your own splitter algorithm. You only need to implement the xapian: weight abstract class. The following is an equivalent splitter.

Class coordinateweight: Public xapian: weight {public: coordinateweight * clone () const {return New coordinateweight;} coordinateweight (){}~ Coordinateweight () {} STD: string name () const {return "coord";} STD: String serialise () const {return "";} optional * unserialise (const STD: string &) const {return New coordinateweight;} xapian: weight get_sumpart (xapian: termcount, xapian: doclength) const {return 1 ;} xapian: weight get_maxpart () const {return 1;} xapian: weight get_sumextra (xapian: doclength) const {return 0;} xapian: weight get_maxextra () const {return 0;} bool get_sumpart_needs_doclength () const {return false ;}};

During query, you can call the set_weighting_scheme method of enquire to inject your own splitter.

2. Sort specific Field Values

Of course, xapian can also sort by user-specified field values, such as price and date. Note that all sorting comparisons are based on strings, therefore, if you want to sort and compare the price, convert the value type into a string type, use xapian: sortable_serialise () for encoding, and use xapian :: sortable_unserialise () for decoding, the value index is likeCodeAs follows:

Xapian: document DOC;
Doc. add_value (0, xapian: sortable_serialise (price ));

All the field values to be sorted must be added to the docuemnt value. The first parameter is the slotid Number of the value.

There are three methods to determine the Document Sorting priority:

    • Enquire: set_sort_by_value () sorts values first, regardless of relevance.
    • Enquire: set_sort_by_value () _ then_relevance () takes the value as the priority. If the value is the same, sort it by relevance.
    • Enquire: set_sort_by_relevance_the_value () is sorted first by relevance and then by value.

The sorting code for adding different values during query is as follows:

// Generate a multi-field document key generator

Xapian: multivaluekeymaker * Keymaker = new xapian: multivaluekeymaker (); Keymaker-> add_value (0); // Add the 0th slot values of the document in ascending order, keymaker-> add_value (1); // Add the 1st slot value of the document in ascending order. If you want to use the descending order, use Keymaker-> add_value (1, true); enquire. set_sort_by_key (Keymaker, false); // injects the key generator into the session. The second parameter indicates whether the key generator is in descending order.

Of course, a more flexible method is that you can implement the Keymaker interface to implement your own key generator. Of course, the key comparison is a string comparison.

3. Reference

Http://xapian.org/docs/sorting.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.