Xapian Study Notes 4-area search
------------------------
1. What is Faceted search?
Faceted search allows users to dynamically aggregate specific attributes of hit documents queried by users. Faceted search is applicable in many places, especially in Chang electronic Mall, the user enters a query condition, and the server returns the classification information of the document hit by the query. For example, if the user queries "computer", the server returns all documents that hit the keyword "computer, the word "computer" appears in the classification of these documents, such as tablets, laptops, and desktops. This clustering is multidimensional, these hit documents may belong to different merchants, and they will also be clustered. In general, the goal of Faceted search is to provide users with a basis for filtering, so as to help users quickly find what they want.
Figure:
It has the following advantages:
- High Information Integration: users can see the integrated information of the queried information. It is not flat information, but multi-dimensional information.
- Result predictability: When you click a category, you can know how many results the category has hit.
- No selection level limit: You can add or delete different aggregation category limits with different
2. How to Create a Faceted search in xapian
For each document in xapian, there are some values corresponding to it. You can put the field values to be clustered into these values, and give a unique slot number, using xapian :: document: add_value () method. For example, if you have a library database, you can place "price" at slot 0 and "author" at slot 1, "publisher" is placed at slot 2 and "publication type" is placed at slot 3. In this way, you can aggregate data by specific values during query, if the hit document is required to be hit at 100-200,200-400,400-500 by price ,.. you must note that the xapian: sortable_serialize method must be used to encode some data fields in order to be correctly sorted.
3. How to query the xapian area search
For example, if you want to perform a faceted query on price and author, you need to use
The xapian: enquire: add_matchspy () method adds the xapian: valuecountmatchspy object to it. It is mainly used to calculate the frequency of occurrence of a value in the hit document. The general code is as follows: xapian: valuecountmatchspy spy0 (0); xapian: valuecountmatchspy spy1 (1); xapian: enquire Enq (db); Enq. add_matchspy (& spy0); Enq. add_matchspy (& spy1); Enq. set_query (query); xapian: mset = Enq. get_mset (0, 10,100 00 );
10000 indicates that xapian must perform a clustering check on at least 10000 documents, and the spy object stores the partition information, which can be obtained through the following method,
Xapian::TermIterator i; for (i = spy0.values_begin(); i != spy0.values_end(); ++i) { cout << *i << ": " << i.get_termfreq() << endl; } for (i = spy1.values_begin(); i != spy1.values_end(); ++i) { cout << *i << ": " << i.get_termfreq() << endl; }
* I indicates the clustering value, that is, the value added by the add_value () method above. If it is a data type, it must be decoded using the xapian: sortable_unserialize method, otherwise, the display may be messy. The get_termfreq () statement later indicates that this clustering has hit several documents, for example, the price range has hit 30 products in.
4. Reference
- Http://xapian.org/docs/facets.html
- Http://en.wikipedia.org/wiki/Faceted_search