Lucene Custom Filters

Source: Internet
Author: User
Tags bcbc

First introduce the query and filter the difference and contact, in fact, query (various queries) and filtering (various filter) very similar between, can say as long as the use of query can be done, with the filter can also be done, they can be converted to each other, The biggest difference is that the result set returned with the filter does not have a scoring action, and the results returned with query are correlated scores, so when we have a business that has nothing to do with the scoring operation, it takes precedence over the filter operation to get better performance. In fact, this is the difference between the Q parameter and the FQ parameter in SOLR.


below, start to get to the point, before this, the scattered fairy still like the old cliché first to understand that Lucene has a general knowledge of the filter



now, let's look at the specifics of how this is implemented in the code, first look at our test data

Java code
  1. ID score bookname ename type price date
  2. 1 1 Ethereal Journey Pmzl novel 52.23 201005
  3. 2 1 Kingdoms sgyy novel 36.13 201207
  4. 3 1 Database Combat Sjksz technology 77.13 200811
  5. 4 1 Series Bible BCBD technology 100.3 200501
  6. 5 1 Workplace Relations ZCGXL career 36.59 200501
  7. 6 1 Healthy Living jksh life 20.47 200008
  8. 7 1 See the Essence KQBZ Society 10.37 201004
  9. 8 1 programming, Programming BCBC Society 10.37 201004


Core Code

Java code
    1. Use filter When the last true contains the boundary part, false when the boundary part is not included
    2. When the second-to-last is true, contains the query boundary, false when it does not contain
    3. Termrangefilter filter=new termrangefilter ("ename", New Bytesref ("H"), New Bytesref ("n"), True, true);
    4. Topdocs Topdocs=searcher.search (New Matchalldocsquery (), filter,10000);//default no Sort mode


Output Results

Java code
    1. 6 1 Healthy Living jksh life 20.47 200008
    2. 7 1 See the Essence KQBZ Society 10.37 201004


Core Code

Java code
    1. Numericrangefilter<double> Filter=numericrangefilter.newdoublerange ("Price", 10D, 40D, True, false);
    2. Topdocs Topdocs=searcher.search (New Matchalldocsquery (), filter,10000);//default no Sort mode


Output Results

Java code
  1. 2 1 Kingdoms sgyy novel 36.13 201207
  2. 5 1 Workplace Relations ZCGXL career 36.59 200501
  3. 6 1 Healthy Living jksh life 20.47 200008
  4. 7 1 See the Essence KQBZ Society 10.37 201004
  5. 8 1 programming, Programming BCBC Society 10.37 201004



Core Code

Java code
    1. Using Cache filtering
    2. Filter Filter=fieldcacherangefilter.newdoublerange ("Price", 20D, 50D, True, true);
    3. Topdocs Topdocs=searcher.search (New Matchalldocsquery (), filter,10000);//default no Sort mode


Output Results

Java code
    1. 2       1        Kingdoms          sgyy         Fiction        36.13       201207         
    2. 5       1        Workplace Relations        zcgxl        Workplace        36.59       200501         
    3. 6       1         Healthy Living         jksh         Life       20.47       200008     


Core Code

Java code
    1. Cache domain filter for specific categories
    2. Filter filter=new fieldcachetermsfilter ("type", New string[]{"Technology", "Society"});
    3. Topdocs Topdocs=searcher.search (New Matchalldocsquery (), filter,10000);//default no Sort mode


Output Results

Java code
  1. 3 1 Database Combat Sjksz technology 77.13 200811
  2. 4 1 Series Bible BCBD technology 100.3 200501
  3. 7 1 See the Essence KQBZ Society 10.37 201004
  4. 8 1 programming, Programming BCBC Society 10.37 201004


Core Code

Java code
    1. Use the Querywrapperfilter class to wrap a query
    2. Querywrapperfilter filter=new Querywrapperfilter (New Termquery ("type", "technology"));
    3. Topdocs Topdocs=searcher.search (New Matchalldocsquery (), filter,10000);//default no Sort mode


Output Results

Java code
    1. 3 1 Database Combat Sjksz technology 77.13 200811
    2. 4 1 Series Bible BCBD technology 100.3 200501



Finally, I look at how to inherit the filter base class, to customize our own filter, the custom filter, although at some point, the function is very powerful and flexible, but there are a few shortcomings, we know 1, the guarantee is the content of non-repeating fields, such as the primary key, if repeated, Default returns the first as a result set showing 2, which guarantees that the content cannot be participle, and if the field is a word breaker, some incorrect results may occur.
Custom Filter Class

Java code
  1. Package com.sanjiesanxian.test;
  2. Import java.io.IOException;
  3. Import Java.util.BitSet;
  4. Import Org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
  5. Import Org.apache.lucene.index.AtomicReaderContext;
  6. Import Org.apache.lucene.index.DocsEnum;
  7. Import Org.apache.lucene.index.Term;
  8. Import Org.apache.lucene.search.DocIdSet;
  9. Import Org.apache.lucene.search.Filter;
  10. Import Org.apache.lucene.util.AttributeSource;
  11. Import org.apache.lucene.util.Bits;
  12. Import Org.apache.lucene.util.DocIdBitSet;
  13. Import Org.apache.lucene.util.FixedBitSet;
  14. Import Org.apache.lucene.util.OpenBitSet;
  15. /***
  16. *^_^  ^_^  ^_^
  17. * Custom Filters
  18. * @author
  19. * */
  20. public class Mycustomfilter extends filter{
  21. Public Mycustomfilter () {
  22. TODO auto-generated Constructor stub
  23. }
  24. Private string[] terms;//limit the returned data dictionary
  25. Public Mycustomfilter (String ... terms) {
  26. TODO auto-generated Constructor stub
  27. this.terms=terms;
  28. }
  29. @Override
  30. Public Docidset Getdocidset (Atomicreadercontext arg0, Bits arg1)
  31. Throws IOException {
  32. Fixedbitset bits=new Fixedbitset (Arg0.reader (). Maxdoc ());//Get not all docid including the non-deleted
  33. The relative cardinality of the int base=arg0.docbase;//segment, guaranteeing a correct position relative to multiple segments
  34. int Limit=base+arg0.reader (). Maxdoc ();//Calculate Maximum limit value
  35. for (String s:terms) {
  36. Docsenum Doc=arg0.reader (). Termdocsenum (New term ("id", s));//must be unique and not repeatable
  37. Guarantee is a single non-repeating term, if repeated, the default will take the first as the return result set, the term after the word also does not use the custom term
  38. if (Doc.nextdoc ()!=-1) {
  39. Bits.set (Doc.docid ());//To add a DocID loop that matches a conditional constraint to bits
  40. }
  41. }
  42. return bits;
  43. }
  44. }


test the query code

Java code
    1. Mycustomfilter filter=new Mycustomfilter ("3", "5", "2");//Optionally specify more than 1 items that need to be filtered
    2. Topdocs Topdocs=searcher.search (New Matchalldocsquery (), filter,10000);


Output Results

Java code
    1. 2       1        Kingdoms          sgyy         Fiction        36.13       201207         
    2. 3       1        Database Combat        sjksz        Technology        77.13       200811         
    3. 5       1         Workplace Relations        zcgxl        Workplace        36.59       200501    



Although the custom filter has drawbacks, it can play a very flexible role in some scenarios, especially for fields that do not have a word breaker.


Lucene Custom Filters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.