Lucene Series-docvalues

Source: Internet
Author: User
Tags numeric ord
docvalues Introduction

Lucene index storage is generally in the way of Inverted zipper (term-doc), but in the search related function processing, such as sorting, highlighting, summary information acquisition, through the document DOCID to find the corresponding positive row information, in Lucene4.0, A new field type, Docvalue, is introduced, which is a column-based field that has a mapping of the document-to-value (Document-to-value) that was built at the time of the index. This method ensures that some of the memory requirements of the field cache are mitigated, and the sorting, faceting, Grouping,fuction query faster. docvalues Advantages and disadvantages: near real-time index: In each index segment there will be a docvalues data structure, the structure and index at the same time, and can be quickly updated, effective; Basic query and filtering support: You can do basic word, scope, etc. basic query, But do not participate in the scoring, and the speed is slow, if you have the speed and score sorting requirements, you can say that the field is set to (indexed= "true") better compression ratio: docvalues fields compression effect is better than fieldcache, but do not emphasize the ultimate. Save Memory: You can define a FieldType Docvaluesformat (docvaluesformat= "disk") so that only a small portion of the data is loaded into memory and the rest remains on disk. Docvalues Implementation Principle

During the indexing process, data that is set to docvalues type is present directly in the RAM or disk with the positive row information. In the process of reading (Segmentreader.java), first read in the local thread map of closeablethreadlocal If there is a direct return. Read through Docvaluesproducer to disk if it does not exist.

@Override public
 numericdocvalues getnumericdocvalues (String field) throws IOException {
   ensureopen ();
   map<string,object> dvfields = Docvalueslocal.get ();

   Object previous = dvfields.get (field);
   if (previous! = null && previous instanceof numericdocvalues) {
     return (numericdocvalues) previous;
   } els e {
     FieldInfo fi = Getdvfield (field, docvaluestype.numeric);
     if (fi = = null) {
       return null;
     }
     Docvaluesproducer dvproducer = dvproducersbyfield.get (field);
     Assert dvproducer! = null;
     Numericdocvalues dv = Dvproducer.getnumeric (FI);
     Dvfields.put (field, DV);
     return DV;
   }
 }

Docvalues to get information on a shard directly through Atomicreader.docvalues (String)
Test Code implementation

Import java.io.IOException;
Import Org.apache.lucene.analysis.standard.StandardAnalyzer;
Import Org.apache.lucene.document.BinaryDocValuesField;
Import org.apache.lucene.document.Document;
Import Org.apache.lucene.document.NumericDocValuesField;
Import Org.apache.lucene.document.SortedDocValuesField;
Import Org.apache.lucene.document.SortedSetDocValuesField;
Import Org.apache.lucene.index.AtomicReader;
Import org.apache.lucene.index.BinaryDocValues;
Import Org.apache.lucene.index.DirectoryReader;
Import Org.apache.lucene.index.IndexReader;
Import Org.apache.lucene.index.IndexWriter;
Import Org.apache.lucene.index.IndexWriterConfig;
Import org.apache.lucene.index.NumericDocValues;
Import org.apache.lucene.index.SortedDocValues;
Import org.apache.lucene.index.SortedSetDocValues;
Import Org.apache.lucene.store.RAMDirectory;
Import Org.apache.lucene.util.BytesRef;

Import org.apache.lucene.util.Version;
  public class Docvauestest {static final String Numeric_field = "NUMERIC"; Static Final String Binary_field = "BINARY";
  Static final String Sorted_field = "SORTED";

  Static final String Sortedset_field = "SORTEDSET";
  Static long[] Numericvals = new long[] {12, 13, 0, 100, 19};
  static string[] binary = new string[] {"Lucene", "Doc", "Value", "Test", "example"};
  Static string[] Sortedvals = new string[] {"Lucene", "Facet", "Abacus", "search", null}; Static string[][] Sortedsetvals = new string[][] {{"Lucene", "search"}, {"Search"}, {"Facet", "Abacus", "search"}, {}, {}}

  ;
  Static Indexreader Topreader;


  Static Atomicreader Atomicreader;
    public static void Main (string[] args) throws IOException {ramdirectory dir = new Ramdirectory ();
    Indexwriterconfig config = new Indexwriterconfig (Version.lucene_4_10_0, New StandardAnalyzer ());
    IndexWriter writer = new IndexWriter (dir, config);
      for (int i = 0; i < numericvals.length; ++i) {Document doc = new document (); Doc.add (New Numericdocvaluesfield (Numeric_field, numericvals[i]));
      Doc.add (New Binarydocvaluesfield (Binary_field, New Bytesref (Binary[i]));
      if (sortedvals[i]! = null) {Doc.add (new Sorteddocvaluesfield (Sorted_field, New Bytesref (Sortedvals[i])); } for (String value:sortedsetvals[i]) {Doc.add (new Sortedsetdocvaluesfield (Sortedset_field, New Bytesref (
      Value));
    } writer.adddocument (DOC);
    } writer.forcemerge (1);
    Writer.commit ();

    Writer.close ();
    Topreader = Directoryreader.open (dir);

    Atomicreader = Topreader.leaves (). Get (0). Reader ();
    Numericdocvalues docVals1 = atomicreader.getnumericdocvalues (Numeric_field);

    System.out.println (docvals1.get (0));
    Binarydocvalues docVals2 = atomicreader.getbinarydocvalues (Binary_field);
    Bytesref bytesref = docvals2.get (0);

    System.out.println (Bytesref.utf8tostring ());
    Sorteddocvalues DOCVALS3 = atomicreader.getsorteddocvalues (Sorted_field);
    String ordinfo = "", values = ""; for (int i = 0; i < AtomicreadeR.maxdoc ();
      ++i) {Ordinfo + = Docvals3.getord (i) + ":";
      Bytesref = Docvals3.get (i);
    Values + = bytesref.utf8tostring () + ":";
    }//2:1:0:3:-1 System.out.println (Ordinfo);


    Lucene:facet:abacus:search:: System.out.println (values);
    Sortedsetdocvalues docvals = atomicreader.getsortedsetdocvalues (Sortedset_field);
    String info = "";
      for (int i = 0; i < Atomicreader.maxdoc (); ++i) {docvals.setdocument (i);
      Long Ord;
      info + = "Doc" + i;
        while (ord = Docvals.nextord ()) = sortedsetdocvalues.no_more_ords) {info + = "," + ord + "/";
        Bytesref = Docvals.lookupord (ORD);
      info + = bytesref.utf8tostring ();
    } info + = ";";
    }//doc 0, 2/lucene, 3/search;doc 1, 3/search;doc 2, 0/abacus, 1/facet, 3/search;doc 3;doc 4;
  SYSTEM.OUT.PRINTLN (info); }
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.