Lucene Series-docvalues

Last Update:2018-07-26 Source: Internet

Author: User

Tags numeric ord

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

docvalues Introduction

Lucene index storage is generally in the way of Inverted zipper (term-doc), but in the search related function processing, such as sorting, highlighting, summary information acquisition, through the document DOCID to find the corresponding positive row information, in Lucene4.0, A new field type, Docvalue, is introduced, which is a column-based field that has a mapping of the document-to-value (Document-to-value) that was built at the time of the index. This method ensures that some of the memory requirements of the field cache are mitigated, and the sorting, faceting, Grouping,fuction query faster. docvalues Advantages and disadvantages: near real-time index: In each index segment there will be a docvalues data structure, the structure and index at the same time, and can be quickly updated, effective; Basic query and filtering support: You can do basic word, scope, etc. basic query, But do not participate in the scoring, and the speed is slow, if you have the speed and score sorting requirements, you can say that the field is set to (indexed= "true") better compression ratio: docvalues fields compression effect is better than fieldcache, but do not emphasize the ultimate. Save Memory: You can define a FieldType Docvaluesformat (docvaluesformat= "disk") so that only a small portion of the data is loaded into memory and the rest remains on disk. Docvalues Implementation Principle

During the indexing process, data that is set to docvalues type is present directly in the RAM or disk with the positive row information. In the process of reading (Segmentreader.java), first read in the local thread map of closeablethreadlocal If there is a direct return. Read through Docvaluesproducer to disk if it does not exist.

@Override public
 numericdocvalues getnumericdocvalues (String field) throws IOException {
   ensureopen ();
   map<string,object> dvfields = Docvalueslocal.get ();

   Object previous = dvfields.get (field);
   if (previous! = null && previous instanceof numericdocvalues) {
     return (numericdocvalues) previous;
   } els e {
     FieldInfo fi = Getdvfield (field, docvaluestype.numeric);
     if (fi = = null) {
       return null;
     }
     Docvaluesproducer dvproducer = dvproducersbyfield.get (field);
     Assert dvproducer! = null;
     Numericdocvalues dv = Dvproducer.getnumeric (FI);
     Dvfields.put (field, DV);
     return DV;
   }
 }

Docvalues to get information on a shard directly through Atomicreader.docvalues (String)
Test Code implementation

Import java.io.IOException;
Import Org.apache.lucene.analysis.standard.StandardAnalyzer;
Import Org.apache.lucene.document.BinaryDocValuesField;
Import org.apache.lucene.document.Document;
Import Org.apache.lucene.document.NumericDocValuesField;
Import Org.apache.lucene.document.SortedDocValuesField;
Import Org.apache.lucene.document.SortedSetDocValuesField;
Import Org.apache.lucene.index.AtomicReader;
Import org.apache.lucene.index.BinaryDocValues;
Import Org.apache.lucene.index.DirectoryReader;
Import Org.apache.lucene.index.IndexReader;
Import Org.apache.lucene.index.IndexWriter;
Import Org.apache.lucene.index.IndexWriterConfig;
Import org.apache.lucene.index.NumericDocValues;
Import org.apache.lucene.index.SortedDocValues;
Import org.apache.lucene.index.SortedSetDocValues;
Import Org.apache.lucene.store.RAMDirectory;
Import Org.apache.lucene.util.BytesRef;

Import org.apache.lucene.util.Version;
  public class Docvauestest {static final String Numeric_field = "NUMERIC"; Static Final String Binary_field = "BINARY";
  Static final String Sorted_field = "SORTED";

  Static final String Sortedset_field = "SORTEDSET";
  Static long[] Numericvals = new long[] {12, 13, 0, 100, 19};
  static string[] binary = new string[] {"Lucene", "Doc", "Value", "Test", "example"};
  Static string[] Sortedvals = new string[] {"Lucene", "Facet", "Abacus", "search", null}; Static string[][] Sortedsetvals = new string[][] {{"Lucene", "search"}, {"Search"}, {"Facet", "Abacus", "search"}, {}, {}}

  ;
  Static Indexreader Topreader;


  Static Atomicreader Atomicreader;
    public static void Main (string[] args) throws IOException {ramdirectory dir = new Ramdirectory ();
    Indexwriterconfig config = new Indexwriterconfig (Version.lucene_4_10_0, New StandardAnalyzer ());
    IndexWriter writer = new IndexWriter (dir, config);
      for (int i = 0; i < numericvals.length; ++i) {Document doc = new document (); Doc.add (New Numericdocvaluesfield (Numeric_field, numericvals[i]));
      Doc.add (New Binarydocvaluesfield (Binary_field, New Bytesref (Binary[i]));
      if (sortedvals[i]! = null) {Doc.add (new Sorteddocvaluesfield (Sorted_field, New Bytesref (Sortedvals[i])); } for (String value:sortedsetvals[i]) {Doc.add (new Sortedsetdocvaluesfield (Sortedset_field, New Bytesref (
      Value));
    } writer.adddocument (DOC);
    } writer.forcemerge (1);
    Writer.commit ();

    Writer.close ();
    Topreader = Directoryreader.open (dir);

    Atomicreader = Topreader.leaves (). Get (0). Reader ();
    Numericdocvalues docVals1 = atomicreader.getnumericdocvalues (Numeric_field);

    System.out.println (docvals1.get (0));
    Binarydocvalues docVals2 = atomicreader.getbinarydocvalues (Binary_field);
    Bytesref bytesref = docvals2.get (0);

    System.out.println (Bytesref.utf8tostring ());
    Sorteddocvalues DOCVALS3 = atomicreader.getsorteddocvalues (Sorted_field);
    String ordinfo = "", values = ""; for (int i = 0; i < AtomicreadeR.maxdoc ();
      ++i) {Ordinfo + = Docvals3.getord (i) + ":";
      Bytesref = Docvals3.get (i);
    Values + = bytesref.utf8tostring () + ":";
    }//2:1:0:3:-1 System.out.println (Ordinfo);


    Lucene:facet:abacus:search:: System.out.println (values);
    Sortedsetdocvalues docvals = atomicreader.getsortedsetdocvalues (Sortedset_field);
    String info = "";
      for (int i = 0; i < Atomicreader.maxdoc (); ++i) {docvals.setdocument (i);
      Long Ord;
      info + = "Doc" + i;
        while (ord = Docvals.nextord ()) = sortedsetdocvalues.no_more_ords) {info + = "," + ord + "/";
        Bytesref = Docvals.lookupord (ORD);
      info + = bytesref.utf8tostring ();
    } info + = ";";
    }//doc 0, 2/lucene, 3/search;doc 1, 3/search;doc 2, 0/abacus, 1/facet, 3/search;doc 3;doc 4;
  SYSTEM.OUT.PRINTLN (info); }
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More