Xapian Study Notes 1

Source: Internet
Author: User

Xapian Introduction
----------
1. Brief Introduction

Xapian is an open-source search engine library written in C ++, allowing GPL protocol (http://www.opensource.org/licenses/gpl-license.php), which can now work with Perl, Python, PHP, java and other languages.
Like Lucene, xapian is only a search engine tool library, on which users can expand their own suitable applications. It is based on the probability model as the basic for querying the score calculation. Of course, it also provides rich Boolean query functions.
If you want to use xapian on your website, you can use an xapian package. Omega can meet most of your needs. Of course, its scalability is good.
The current stable version of xpian is 1.2.10,

2. Features

The main functions of xpian are as follows:

  • × Open source, based on the GPL Protocol
  • * Unicode is supported, and index data is also stored in UTF-8
  • × Portability: it can run on Linux, Mac OS X, and windows.
  • × Supports binding in multiple languages, including Perl, Python, Java, PHP, and C #.
  • × Calculate scores based on the conceptual model
  • × Relevance feedback: xapian can return phrases, documents, or a type of related documents based on the user's query conditions.
  • ×For phrase and approximate word query, the user's query conditions can specify the order in which words appear in the phrase, the number of occurrences, and other conditions
  • * Supports Boolean queries, such as "a not B". the sorting of Boolean query results is based on the probability model,
  • × Supports word Query
  • * Prefix query supported, such as xap *
  • × You can query synonyms,
  • × Supports spelling detection based on user query Conditions
  • × Supports Faceted search. Http://xapian.org/docs/facets
  • × Support for data files larger than 2 GB
  • * The index format is independent from the platform. You can create an index in Linux and copy the index file to a Windows machine for query.
  • × Supports synchronous update and query. New documents can be queried immediately.

Xapian also provides a CGI query application, Omega, which has the following features:

  • * The index format supports HTML, PHP, PDF, And postscript. You can also use its filters to customize the index format,
  • × You can use the Perl DBI module to support SQL indexes, such as MySQL, pstgresql, SQLite, and Oracle.
  • × CGI has good scalability. Supports custom output of XML and CSV files,
3. Example 3.1 installation in Linux

In ubuntu or Debian, you can use

    $ sudo apt-get install python-xapian    $ sudo apt-get install libxapian-dev

To install, or compile and install from the source code,./configure, make, make install

3.2 index creation

# Include <xapian. h> // header file # include <iostream> # include <string> # include <cstdlib> // For exit (). # include <cstring> using namespace STD; intmain (INT argc, char ** argv) Try {If (argc! = 2 | argv [1] [0] = '-') {int rc = 1; if (argv [1]) {If (strcmp (argv [1], "-- version") = 0) {cout <"simpleindex" <Endl; exit (0);} If (strcmp (argv [1], "-- Help ") = 0) {rc = 0 ;}} cout <"Usage: "<argv [0] <" path_to_database \ n "" index each paragraph of a text file as a xapian document. "<Endl; exit (RC);} // open the database for update, creating a new database if necessary. // create or open a readable/writable number Data Warehouse xapian: writabledatabase dB (argv [1], xapian: db_create_or_open); // word divider xapian: termgenerator indexer; xapian: stem Stemmer ("English "); indexer. set_stemmer (Stemmer); string para; while (true) {string line; If (CIN. EOF () {If (para. empty () break;} else {Getline (CIN, line);} If (line. empty () {If (! Para. empty () {// we 've reached the end of a paragraph, so index it. // generate a document xapian: document DOC; Doc. set_data (para); // defines document data, which is not transparent to users. Users can define some attributes of the document, or Uri, path and other information // set the document, Word Segmentation indexer. set_document (DOC); indexer. index_text (para); // Add the document to the database. // Add the document to the database. add_document (DOC); para. resize (0) ;}} else {If (! Para. empty () para + = ''; Para + = line ;}}// explicitly commit so that we get to see any errors. writabledatabase's // destructor will commit implicitly (unless we're in a transaction) But // will swallow any exceptions produced. DB. commit ();} catch (const xapian: Error & E) {cout <E. get_description () <Endl; exit (1 );}

3.3 Query

# Include <xapian. h> # include <iostream> # include <string> # include <cstdlib> // For exit (). # include <cstring> using namespace STD; intmain (INT argc, char ** argv) Try {// we require at least two command line arguments. if (argc <3) {int rc = 1; if (argv [1]) {If (strcmp (argv [1], "-- version") = 0) {cout <"simplesearch" <Endl; exit (0);} If (strcmp (argv [1], "-- Help") = 0) {rc = 0 ;}}cout <"Usage :" <Argv [0] <"path_to_database query" <Endl; exit (RC);} // open the database for searching. // open the database xapian: Database dB (argv [1]); // start an enquire session. // generate query session xapian: enquire (db); // combine the rest of the command line arguments with spaces between // them, so that simple queries don't have to be quoted at the shell // level. string QUERY_STRING (argv [2]); argv + = 3; while (* Argv) {QUERY_STRING + = ''; QUERY_STRING + = * argv ++;} // parse the query string to produce a xapian: query object. // generate the query parser xapian: queryparser QP; xapian: stem Stemmer ("English"); QP. set_stemmer (Stemmer); QP. set_database (db); QP. set_stemming_strategy (xapian: queryparser: stem_some); try // parse the query condition xapian: Query query = QP. parse_query (QUERY_STRING); cout <"parsed query is:" <query. get_descri Ption () <Endl; // find the top 10 results for the query. // put the parsed query condition into the query session, enquire. set_query (query); // obtain the query result. xapian: mset matches = enquire. get_mset (0, 10); // display the results. cout <matches. get_matches_estimated () <"results found. \ n "; cout <" matches 1-"<matches. size () <": \ n" <Endl; // obtain the query result for (xapian: msetiterator I = matches. begin (); I! = Matches. end (); ++ I) {cout <I. get_rank () + 1 <":" <I. get_percent () <"% docid =" <* I <"[" <I. get_document (). get_data () <"] \ n" ;}} catch (const xapian: Error & E) {cout <E. get_description () <Endl; exit (1 );}

4. Reference

Http://xapian.org/

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.