The use of lucene-query and the maintenance of its index library

Source: Internet
Author: User

1 use of the Lucene query

Lucene is the syntax by which query objects generate queries by using the query object to execute queries. such as Bookname:java, the search for document data that contains Java in the BookName domain.

1.1 Two ways to create a query object 1.1.1 Use the query subclass Object 1.1.1.1 Common Query Subclass Object
sub-class object Description
Termquery Do not use the parser, the keyword to do exact matching search. such as: Order number, Social Security number
Numericrangequery Numeric range queries, such as: book prices are greater than 80, less than 100
Booleanquery Boolean query, implement combination condition query. The composition relationships are: 1. Must and must: expression "and", that is, "intersection" 2. Must and must not: Contains the former, excluding the latter 3. must not with must not: no meaning 4. Should and must: represents must, should loses meaning 5. Should and must not: equals must and must not 6. Should and should denote "or", i.e. "set"
1.1.1.2 Common query Subclass objects use 1.1.1.2.1 with Termquery
    • Requirements: Query books that contain Java in the name of the book.
/** * Search index (encapsulate search method) */private void Seracher (query query) throws Exception {//Print query Syntax System.out.println ("query syntax:" + q        Uery); 1. Create an index library directory location Object (directory) that specifies the location of the index Library Directory directory = Fsdirectory.open ("/users/healchow/documents/        Index "));        2. Create an index read object (indexreader) to read the index Indexreader reader = directoryreader.open (directory);      3. Create index Search object (indexsearcher) to perform search Indexsearcher searcher = new Indexsearcher (reader); 4.  Perform a search using the Indexsearcher object, return the search result set topdocs//parameter one: Use the query object, parameter two: Specify the first n topdocs topdocs = searcher.search (query,) after sorting the search results to be returned    10); 5.    Processing result set//5.1 print the number of results actually queried System.out.println ("Number of results actually queried:" + topdocs.totalhits);        5.2 Gets the result array of the search//Scoredoc The ID of the document and its score scoredoc[] Scoredocs = Topdocs.scoredocs;        for (Scoredoc Scoredoc:scoredocs) {System.out.println ("= = = = = = = = = = = = = = = = = = = = =");        Gets the ID and rating of the document int docId = Scoredoc.doc;        FLOAT score = Scoredoc.score; SyStem.out.println ("Document Id=" + DocId + ", score =" + score);        Querying document data based on document ID-equivalent to querying data in a relational database based on primary key ID doc = Searcher.doc (docId);        System.out.println ("Book ID:" + doc.get ("bookId"));        System.out.println ("Book Name:" + doc.get ("BookName"));        System.out.println ("Book Price:" + doc.get ("Bookprice"));        System.out.println ("Book Picture:" + doc.get ("Bookpic"));    System.out.println ("Book Description:" + doc.get ("Bookdesc")); }//6. Close resource Reader.close ();}
    • Test using Termquery:
/** * 测试使用TermQuery: 需求: 查询图书名称中包含java的图书 */@Testpublic void testTermQuery() throws Exception {    //1. 创建TermQuery对象    TermQuery termQuery = new TermQuery(new Term("bookName", "java"));    // 2.执行搜索    this.seracher(termQuery);}

1.1.1.2.2 using Numericrangequery
    • Requirements: Check book prices between 80-100 books (not including 80 and 100):
/** * 测试使用NumericRangeQuery: 需求: 查询图书价格在80-100之间的图书 */@Testpublic void testNumericRangeQuery() throws Exception{    // 1.创建NumericRangeQuery对象, 参数说明:     // field: 搜索的域; min: 范围最小值; max: 范围最大值    // minInclusive: 是否包含最小值(左边界); maxInclusive: 是否包含最大值(右边界)    NumericRangeQuery numQuery = NumericRangeQuery.newFloatRange("bookPrice", 80f, 100f, false, false);    // 2.执行搜索    this.seracher(numQuery); }

    • The tests contain 80 and 100:
// 测试包含80和100NumericRangeQuery numQuery = NumericRangeQuery.newFloatRange("bookPrice", 80f, 100f, true, true);

1.1.1.2.3 using Booleanquery
    • Requirement: Search for books with Lucene in the book name and prices between 80-100.
/** * 测试使用BooleanQuery: 需求: 查询图书名称中包含Lucene, 且价格在80-100之间的图书 */@Testpublic void testBooleanQuery() throws Exception {    // 1.创建查询条件    // 1.1.创建查询条件一    TermQuery query1 = new TermQuery(new Term("bookName", "lucene"));        // 1.2.创建查询条件二    NumericRangeQuery query2 = NumericRangeQuery.newFloatRange("bookPrice", 80f, 100f, true, true);    // 2.创建组合查询条件    BooleanQuery bq = new BooleanQuery();    // add方法: 添加组合的查询条件    // query参数: 查询条件对象    // occur参数: 组合条件    bq.add(query1, Occur.MUST);    bq.add(query2, Occur.MUST);        // 3.执行搜索   this.seracher(bq);}

In the query syntax, the "+" representation and the condition, "-" means that the following conditions are not included:

1.1.2 Using Queryparser

Description: Uses the Queryparser object to parse the query expression and instantiate the Queries object.

1.1.2.1 queryparse expression Syntax
    • Keyword Basic query: domain + ":" + keywords, such as: Bookname:lucene
    • Range Query: Domain name + ":" +[minimum to maximum value], e.g. price:[80 to 100]. Note that Queryparser does not support numeric range queries, only for string range queries. If you have a numeric range query requirement, use Numericrangequery.
    • Combination query:
Conditional notation Symbol Description symbol Representation
Occur.must The search criteria must be met, equal to and +
Occur.should Search conditions are optional, equivalent to or Space
Occur.must_not Search conditions are not met, equivalent to not non- -
1.1.3 Using Queryparser

Requirements: Query The book name contains Java, and the book name contains "Lucene" books.

/** * 测试使用QueryParser: 需求: 查询图书名称中包含Lucene, 且包含java的图书 */@Testpublic void testQueryParser() throws Exception {    // 1.创建查询对象    // 1.1.创建分析器对象    Analyzer analyzer = new IKAnalyzer();    // 1.2.创建查询解析器对象    QueryParser qp = new QueryParser("bookName", analyzer);    // 1.3.使用QueryParser解析查询表达式    Query query = qp.parse("bookName:java AND bookName:lucene");        // 2.执行搜索    this.seracher(query);}

Note: With Queryparser, the combined keyword And/or/not in the expression must be capitalized. When the default search domain is set, it is not writable if the queried field has not changed.

2 maintenance of Lucene Index Library

The data is stored in the relational database, which needs to be increased, deleted, changed and checked. Index is saved in the index library, also need to implement add, delete, change, check operation.

2.1 Adding indexes

Refer to the Lucene-starter program and the simple use of Java API content:

2.2 Delete Index 2.2.1 According to term delete index
    1. Creates a parser object (analyzer) for participle;
    2. Create an index configuration object (indexwriterconfig) for configuring Lucene;
    3. Create an index Library directory object (directory) that specifies the location of the index library;
    4. Create an index write Object (IndexWriter) for manipulating the index;
    5. Create Delete Condition object (term);
    6. Use the IndexWriter object to perform the deletion;
    7. The
    8. frees the resource.
 /** * Delete index according to term */@Testpublic void Deleteindexbyterm () throws IOException {//1. Creating a Parser object (analy     Zer), used for Word analyzer analyzer = new Ikanalyzer (); 2. Create an index configuration object (indexwriterconfig) for configuring LUCENE indexwriterconfig IWC = new Indexwriterconfig (version.lucene_4_10_4, anal    Yzer); 3. Create an Index Library directory object (directory) that specifies the location of the index Library Directory directory = Fsdirectory.open (The new File ("/users/healchow/documents/        Index "));        4. Create an index write Object (IndexWriter) for manipulating the index indexwriter writer = new IndexWriter (directory, IWC);    5. Create Delete Condition object (term)//delete book Name field, index containing "Java"//delete from table where name= "Java"//parameter one: Delete the name of the domain, parameter two: delete the condition value     Term term = new term ("BookName", "Java");     6. Use IndexWriter object, execute delete//variable parameter, can pass multiple term writer.deletedocuments (term); 7. Release the resource Writer.close ();}  

When performing a delete operation (indexwriter.deletedocuments (term)), the corresponding field cannot be participle and can only be a word, and the field must be indexed , Lucene will search first, then delete all records that meet the criteria (pseudo-Delete, ". Del" tag)-it is better to define a unique identity to do the delete operation.

Whether or not to delete an index requires a discussion: We know that Lucene organizes indexed content in segments (segment), and when you perform a delete operation (indexwriter.deletedocuments (term)) with a term If the index segment still contains the index of other words that match the conditional document object, the entire index data is preserved (performance is degraded if an update operation is taken), and if not, the index data is also deleted:

View the number of the deleted document:

2.2.2 Delete all indexes (with caution)
    1. Create a Parser Object (Analyzer) for Word segmentation;
    2. Create an index configuration object (indexwriterconfig) for configuring Lucene;
    3. Create an index Library directory object (directory) that specifies the location of the index library;
    4. Create an index write Object (IndexWriter) for manipulating the index library;
    5. Use the IndexWriter object to perform the deletion;
    6. Frees resources.
/*** 删除全部索引*/@Testpublic void deleteAllIndex() throws IOException {   // 1.创建分析器对象(Analyzer), 用于分词   Analyzer analyzer = new IKAnalyzer();   // 2.创建索引配置对象(IndexWriterConfig), 用于配置Lucene   IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_10_4, analyzer);      // 3.创建索引库目录对象(Directory), 用于指定索引库的位置   Directory directory = FSDirectory.open(new File("/Users/healchow/Documents/index"));      // 4.创建索引写入对象(IndexWriter), 用于操作索引   IndexWriter writer = new IndexWriter(directory, iwc);      // 5.使用IndexWriter对象, 执行删除   writer.deleteAll();      // 6.释放资源  writer.close();}

Result of the index after deletion:

Deletes all indexes, deleting the data for the document field, and the data for the indexed fields.

Truncate delete similar to relational database: Completely delete data, including storage structure, thus faster.

2.3 Update Index

Lucene updates the index based on the term object: Executes the query based on the term, queries to the update, and then adds the index if the query is not executed.

    1. Creates a parser object (analyzer) for participle;
    2. Create an index configuration object (indexwriterconfig) for configuring Lucene;
    3. Create an index Library directory object (directory) that specifies the location of the index library;
    4. CREATE index Write Object (IndexWriter) for manipulating the index library;
    5. create Document object;
    6. create a term object;
    7. Use the IndexWriter object to perform the update;
    8. The
    9. frees the resource.
 /** * Update index */@Testpublic void Updateindexbyterm () throws ioexception{//1. Create a Parser Object (analyzer) with       In the word analyzer Analyzer = new Ikanalyzer (); 2. Create an index configuration object (indexwriterconfig) for configuring LUCENE indexwriterconfig IWC = new Indexwriterconfig (version.lucene_4_10_4, anal       Yzer); 3. Create an Index Library directory object (directory) that specifies the location of the index Library Directory directory = Fsdirectory.open (The new File ("/users/healchow/documents/       Index "));     4. Create an index write Object (IndexWriter) for manipulating the index indexwriter writer = new IndexWriter (directory, IWC);    5. Create Document Object (documents) Doc = new file ();    Doc.add (New TextField ("id", "1234", Store.yes));    Doc.add (New TextField ("name", "MyBatis and Springmvc", Store.yes));     Doc.add (New TextField ("name", "MyBatis and Struts2", Store.yes));     6. Create term object term = new term ("name", "Springmvc");       7. Use the IndexWriter object to perform the update writer.updatedocument (term, doc); 8. Release the resource Writer.close ();}  

The first execution, because the corresponding index is not found, so do add function, the result (case-insensitive):

The second execution, because the index library already has "name" = "Springmvc" content, so do the update operation: the entire TextField content:"MyBatis and Struts2" added to the index library, and merged with the last result, as shown:

If you change the condition value of the Name field in the term in the second execution (there is no corresponding in the index), you will continue to add functionality: Add the contents of the entire TextField to the index.

Copyright Notice

Author: Ma_shoufeng (Ma Ching)

Source: Blog Park Ma Ching's Blog

Your support is a great encouragement to bloggers, thank you for your reading.

The copyright of this article is owned by bloggers, welcome reprint, but without the blogger agreed to retain this paragraph statement, and in the article page obvious location to the original link, otherwise Bo Master reserves the right to pursue legal responsibility.

The use of lucene-query and the maintenance of its index library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.