Step by step to learn from me Lucene (6)---lucene index optimization multi-thread creation index

Source: Internet
Author: User

These two days work a bit busy, blog update not timely, please everyone forgive me;

Before we learned that Lucene was created in the index, a IndexWriter acquires a read-write lock, which is inefficient when Lucene creates an index of the large data volume.

View previous documents step by step with me Lucene (5)---The index construction principle of lucene can be seen, the establishment of Lucene index, with the following points is very large;

    1. Disk space size, which directly affect the establishment of the index, or even cause the index write prompt completion, but there is no synchronization problem;
    2. The selection of the index merge policy, which is similar to the bulk operation in SQL, the number of bulk operations has a direct impact on the efficiency of execution, for Lucene, the index before merging the document in memory, so the selection of the appropriate merge strategy can also improve the efficiency of the index;
    3. A unique index corresponds to the choice of term, Lucene index is created by removing the document containing the same term from the index and then re-adding the document to the index, where the term corresponding document too much, will occupy disk IO, At the same time, the IndexWriter's write-lock takes longer and the corresponding execution efficiency is low.

To sum up, the index optimization to ensure disk space, while the term selection can be identified by ID and other to ensure uniqueness, so that the first and third risk is circumvented;

The purpose of this paper is to improve the efficiency of the index in the way of merging strategy and multi-thread creation;

Multi-threaded CREATE INDEX, I also design a multi-directory index creation, so as to avoid the same directory data volume too large index block merge and index block re-application;

Needless to say, here is the code, code example is to read the Lucene website download and unzip the folder and to index the file information

Define Filebean to store file information first

Package Com.lucene.bean;public class Filebean {//path private string path;//modification time private long modified;//content private string Content;public String GetPath () {return path;} public void SetPath (String path) {this.path = path;} Public Long getmodified () {return modified;} public void SetModified (Long modified) {this.modified = modified;} Public String getcontent () {return content;} public void SetContent (String content) {this.content = content;}}

Next is a tool class that iterates through the folder's information and transforms it into a collection of Filebean

package Com.lucene.index.util;import Java.io.file;import Java.io.ioexception;import Java.nio.file.files;import Java.nio.file.paths;import Java.util.linkedlist;import Java.util.list;import Com.lucene.bean.filebean;public class Fileutil {/** read file information and subordinate folder * @param folder * @return * @throws ioexception */public static list<filebean> Getfolderfi Les (String folder) throws IOException {list<filebean> Filebeans = new linkedlist<filebean> (); File File = new file (folder), if (File.isdirectory ()) {file[] files = file.listfiles (), if (Files! = null) {for (file File2:fi Les) {Filebeans.addall (Getfolderfiles (File2.getabsolutepath ()))}}} Else{filebean bean = new Filebean (); Bean.setpath (File.getabsolutepath ()); Bean.setmodified (File.lastmodified ()); Bean.setcontent (New String (Files.readallbytes (paths.get (folder))); Filebeans.add (bean);} return Filebeans;}} 

Define a common class for working with indexes

Package Com.lucene.index;import Java.io.file;import Java.io.ioexception;import java.text.parseexception;import Java.util.list;import Java.util.concurrent.countdownlatch;import Org.apache.lucene.index.indexwriter;public Abstract class Baseindex<t> implements runnable{/** * Parent index path */private String parentindexpath;/** * Index writer */private  IndexWriter writer;private int subindex;/** * Main thread */private final Countdownlatch countDownLatch1; /** * worker thread */private final Countdownlatch countDownLatch2; /** * Object list */private list<t> list;public baseindex (indexwriter writer,countdownlatch countDownLatch1, Countdownlatch countdownlatch2,list<t> List) {super (); this.writer = Writer;this.countdownlatch1 = COUNTDOWNLATCH1;THIS.COUNTDOWNLATCH2 = Countdownlatch2;this.list = list;} Public BaseIndex (String parentindexpath, int subindex,countdownlatch countDownLatch1, Countdownlatch countDownLatch2, List<t> list) {super (); This.parentindexpath = Parentindexpath;this.subindex = subindex;try {//Multi-Directory index creates a file File = new file (parentindexpath+ "/index" +subindex), if (!file.exists ()) {File.mkdir ();} This.writer = Indexutil.getindexwriter (parentindexpath+ "/index" +subindex, True);} catch (IOException e) {//TODO auto-generated catch Blocke.printstacktrace ();}; This.subindex = Subindex;this.countdownlatch1 = COUNTDOWNLATCH1;THIS.COUNTDOWNLATCH2 = CountDownLatch2;this.list = List;} Public BaseIndex (String path,countdownlatch countDownLatch1, Countdownlatch countdownlatch2,list<t> List) { Super (); try {//single directory index creates a file File = new file (path); if (!file.exists ()) {File.mkdir ();} This.writer = Indexutil.getindexwriter (path,true);} catch (IOException e) {//TODO auto-generated catch Blocke.printstacktrace ();}; This.countdownlatch1 = COUNTDOWNLATCH1;THIS.COUNTDOWNLATCH2 = Countdownlatch2;this.list = list;} /** CREATE INDEX * @param writer * @param carsource * @param create * @throws IOException * @throws parseexception */public abstr ACT void Indexdoc (IndexWriter writer,t T) throws exception;/** Bulk Index creation * @paramWriter * @param T * @throws Exception */public void Indexdocs (IndexWriter writer,list<t> t) throws Exception{for (t T2:t) {Indexdoc (WRITER,T2);}} @Overridepublic void Run () {try {countdownlatch1.await (); System.out.println (writer); Indexdocs (writer,list);} catch (Interruptedexception e) {//Todo auto-generated catch Blocke.printstacktrace ();} catch (Exception e) {//Todo auto- Generated catch Blocke.printstacktrace ();} Finally{countdownlatch2.countdown (); try {writer.commit (); Writer.close ();} catch (IOException e) {//TODO Auto-generated catch Blocke.printstacktrace ();}}}

The

Filebeanindex class is used to process Filebean index creation

Package Com.lucene.index;import Java.util.list;import Java.util.concurrent.countdownlatch;import Org.apache.lucene.document.document;import Org.apache.lucene.document.field;import Org.apache.lucene.document.longfield;import Org.apache.lucene.document.stringfield;import Org.apache.lucene.document.textfield;import Org.apache.lucene.index.indexwriter;import Org.apache.lucene.index.indexwriterconfig;import Org.apache.lucene.index.term;import Com.lucene.bean.FileBean; public class Filebeanindex extends Baseindex<filebean>{public Filebeanindex (indexwriter writer, countdownlatch Countdownlatch1,countdownlatch countDownLatch2, list<filebean> List) {Super (writer, countDownLatch1, COUNTDOWNLATCH2, list);} Public Filebeanindex (String parentindexpath, int subindex, Countdownlatch countdownlatch1,countdownlatch COUNTDOWNLATCH2, list<filebean> list) {super (Parentindexpath, subindex, CountDownLatch1, COUNTDOWNLATCH2, list) ;} @Overridepublic void Indexdoc (indexwriter writer, Filebean T) Throws Exception {Document doc = new Document (); System.out.println (T.getpath ());d Oc.add (New Stringfield ("Path", T.getpath (), Field.Store.YES));d Oc.add (new Longfield ("Modified", t.getmodified (), Field.Store.YES));d Oc.add (New TextField ("Content", t.getcontent (), Field.Store.YES)); if (Writer.getconfig (). Getopenmode () = = IndexWriterConfig.OpenMode.CREATE) {writer.adddocument (d    OC);    }else{writer.updatedocument (New Term ("path", T.getpath ()), doc); }}}

The

Indexutil Tool class sets the policy for index merging

Package Com.lucene.index;import Java.io.ioexception;import Java.nio.file.paths;import Org.apache.lucene.analysis.analyzer;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.index.indexwriter;import Org.apache.lucene.index.indexwriterconfig;import Org.apache.lucene.index.logbytesizemergepolicy;import Org.apache.lucene.index.logmergepolicy;import Org.apache.lucene.store.directory;import Org.apache.lucene.store.fsdirectory;public class IndexUtil {/** CREATE index writer * @ Param Indexpath * @param create * @return * @throws ioexception */public static IndexWriter getindexwriter (String indexpat    H,boolean create) throws Ioexception{directory dir = Fsdirectory.open (Paths.get (Indexpath, New string[0]));    Analyzer Analyzer = new StandardAnalyzer ();    Indexwriterconfig IWC = new Indexwriterconfig (analyzer);    Logmergepolicy mergepolicy = new Logbytesizemergepolicy ();    Set segment when adding documents (document)//value is small, indexing is slow//value is large, indexing is faster, >10 suitable for batch indexing        Mergepolicy.setmergefactor (50); Set segment maximum number of merged documents (document)//value is small to facilitate the speed of the index//value is large, suitable for batch indexing and faster search Mergepolicy.setmaxme                         Rgedocs (5000);    if (create) {Iwc.setopenmode (IndexWriterConfig.OpenMode.CREATE);    }else {iwc.setopenmode (IndexWriterConfig.OpenMode.CREATE_OR_APPEND);    } indexwriter writer = new IndexWriter (dir, IWC); return writer;}}


Testindex class Execute test program

Package Com.lucene.index.test;import Java.util.list;import Java.util.concurrent.countdownlatch;import Java.util.concurrent.executorservice;import Java.util.concurrent.executors;import Org.apache.lucene.index.indexwriter;import Com.lucene.bean.filebean;import Com.lucene.index.filebeanindex;import Com.lucene.index.util.fileutil;public class Testindex {public static void main (string[] args) {try {List<filebean > Filebeans = Fileutil.getfolderfiles ("c:\\users\\lenovo\\desktop\\lucene\\lucene-5.1.0"); int totalCount = Filebeans.size (); int perthreadcount = 3000; System.out.println ("The total number of data queried is" +filebeans.size ()); int threadcount = Totalcount/perthreadcount + (totalcount% Perthreadcount = = 0?  0:1);  Executorservice pool = Executors.newfixedthreadpool (threadcount);  Countdownlatch countDownLatch1 = new Countdownlatch (1);  Countdownlatch countDownLatch2 = new Countdownlatch (threadcount); System.out.println (Filebeans.size ()); for (int i = 0; i < ThreadCount; i++) {int start = I*perthreadcouNt;int end = (i+1) * Perthreadcount < TotalCount? (i+1) * PERTHREADCOUNT:TOTALCOUNT; list<filebean> sublist = filebeans.sublist (start, end); Runnable Runnable = new Filebeanindex ("index", I, CountDownLatch1, COUNTDOWNLATCH2, sublist);//child threads are given to thread pool management Pool.execute (  runnable);  } countdownlatch1.countdown ();  System.out.println ("Start index creation");   Wait for all threads to complete countdownlatch2.await ();  The thread completes all work System.out.println ("All threads are created index complete");  Release thread pool resource Pool.shutdown (); } catch (Exception e) {//TODO auto-generated catch Blocke.printstacktrace ();}}}


Above that is multi-threaded multi-Directory index, we have any questions about the welcome exchange;

Step by step with me to learn Lucene is a summary of the recent Lucene index, we have a question to contact my q-q: 891922381, at the same time I new Q-q group: 106570134 (Lucene,solr,netty,hadoop), such as Mongolia joined, Greatly appreciated, we discuss together, I strive for a daily Bo, I hope that we continue to pay attention, will bring you surprise




Step by step to learn from me Lucene (6)---lucene index optimization multi-thread creation index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.