Grouping introduction
When we do lucene search, we may use the statistics of a certain condition, such as statistics of how many provinces, in SQL query we can use distinct to complete similar functions, you can also use GROUP by to query column query. In Lucene we achieve similar functions how to do, more time-consuming practice when we query out all the results, and then the results in the province corresponding to the field query out, to set inside, it is clear that this practice is inefficient, not to be taken; In order to solve the above problems, Lucene Provides a module for grouping operations Group,group The primary user to handle grouping statistics for different document with one field value in different Lucene.
Grouping can receive the following parameters:
- GroupField: The fields to be grouped, for example, we group the provinces (province), to pass in the corresponding value is province, note that if GroupField does not exist in document, it will return a null grouping;
- Groupsort: How the grouping is sorted, the sorting field determines the order of the grouping content display;
- Topngroups: The number of group shows, only 0 to Topngroup Records;
- Groupoffset: From the beginning of the first few topgroup, for example, Groupoffset is 3, will show from 3 to Topngroup corresponding records, this value we can be used for paging query;
- Withingroupsort: How to sort within each group;
- Maxdocspergroup: How many document each group handles;
- Withingroupoffset: The initial position of the document displayed by each group;
The implementation of group requires two steps:
- The first step: using Termfirstpassgroupingcollector to collect top groups;
- Step Two: Use
TermSecondPassGroupingCollector处理每个group对应的documents
group模块定义了group和group的采集方式;所有的grouping colletor,所有的grouping collector都是抽象类并且提供了基于term的实现;
实现group的前提:
要group的field必须是必须是SortedDocValuesField
类型的;
solr尽管也提供了grouping by的相关方法实现,但是对group的抽象实现还是由该模块实现;
暂不支持sharding,我们需要自己提供groups和每个group的documents的合并
Group Example
Package Com.lucene.search;import Java.io.ioexception;import Org.apache.lucene.search.indexsearcher;import Org.apache.lucene.search.matchalldocsquery;import Org.apache.lucene.search.query;import Org.apache.lucene.search.sort;import Org.apache.lucene.search.sortfield;import Org.apache.lucene.search.grouping.groupdocs;import Org.apache.lucene.search.grouping.groupingsearch;import Org.apache.lucene.search.grouping.topgroups;import Org.apache.lucene.util.bytesref;public class GroupSearchTest { public static void Main (string[] args) {Groupingsearch groupingsearch = new Groupingsearch ("province"); SortField SortField = new SortField ("City", SortField.Type.STRING_VAL); Sort sort = new sort (SortField); Groupingsearch.setgroupsort (sort); Groupingsearch.setfillsortfields (true); GROUPINGSEARCH.SETCACHINGINMB (4.0, True); Groupingsearch.setallgroups (true); Indexsearcher searcher;try {searcher = Searchutil.getindexsearcherbyindexpath ("index", null); Query query = new Matchalldocsquery (); topgroups<bytesref&Gt result = Groupingsearch.search (searcher,query, 0, Searcher.getindexreader (). Maxdoc ());//Render Groupsresult ... groupdocs<bytesref>[] Docs = result.groups;for (groupdocs<bytesref> groupdocs:docs) {System.out.println ( New String (GroupDocs.groupValue.bytes));} int totalgroupcount = Result.totalgroupcount; System.out.println (Totalgroupcount);} catch (IOException e) {//TODO auto-generated catch Blocke.printstacktrace ();}}}
UseBlockgroupingcollector
We sometimes want to save the Group field at index time to facilitate search, we can make sure that docs is indexed, first of all the term corresponding to the group of documents, Then in the last document insert a tag grouping field, which we can do:
/** with group index creation * @param writer * @param docs * @throws IOException */public void Indexdocswithgroup (IndexWriter writ er,string groupfieldname,string groupfieldvalue,list<document> Docs) throws ioexception{Field GroupEndField = NE W Field (Groupfieldname, Groupfieldvalue, Field.Store.NO, Field.Index.NOT_ANALYZED); Docs.get (Docs.size ()-1). Add (Groupendfield); Writer.updatedocuments (New term (groupfieldname, groupfieldvalue), docs); Writer.commit (); Writer.close (); }
In a group query, we can
/**group query, for cases where the Group field has been segmented index * @param searcher * @param groupendquery * @param query * @para M sort * @param withingroupsort * @param groupoffset * @param topngroups * @param needsscores  ; * @param docoffset * @param docspergroup * @param fillfields * @return * @throws ioexception */ public static topgroups<bytesref> gettopgroupsbygroupterm (Indexsearcher searcher,Query groupEndQuery, Query query,sort sort,sort withingroupsort,int groupoffset,int topngroups,boolean needsscores,int docOffset,int Docspergroup,boolean fillfields) throws ioexception{ @SuppressWarnings ("deprecation") Filter Groupenddocs = new Cachingwrapperfilter (new Querywrapperfilter (groupendquery)); Blockgroupingcollector C = new Blockgroupingcollector (sort, groupoffset+topngroups, Needsscores, GroupEndDocs); searcher.search (query, C); @SuppressWarnings ("unchecked") &NBSP;&NBsp Topgroups<bytesref> Groupsresult = (topgroups<bytesref>) c.gettopgroups (WithinGroupSort, GroupOffset, Docoffset, Docoffset+docspergroup, fillfields); return groupsresult; }
We can also direct the group query, this is a common implementation
Query method
/** * @param searcher * @param query * @param groupfieldname * @param sort * @param maxcacherammb * @param page * @param p Erpage * @return * @throws ioexception */public static topgroups<bytesref> gettopgroups (Indexsearcher searcher, Query query,string groupfieldname,sort sort,double maxcacherammb,int page,int perpage) throws ioexception{ Groupingsearch groupingsearch = new Groupingsearch (groupfieldname); Groupingsearch.setgroupsort (sort); Groupingsearch.setfillsortfields (True); GROUPINGSEARCH.SETCACHINGINMB (MAXCACHERAMMB, true); Groupingsearch.setallgroups (TRUE); topgroups<bytesref> result = Groupingsearch.search (Searcher,query, (page-1) *perpage, page*perpage); return Result;}
The following is the tool class for the query
Query Tool class
Package Com.lucene.search;import Java.io.file;import Java.io.ioexception;import java.nio.file.paths;import Java.util.set;import Java.util.concurrent.executorservice;import Org.apache.lucene.analysis.analyzer;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.document.document;import Org.apache.lucene.index.directoryreader;import Org.apache.lucene.index.indexreader;import Org.apache.lucene.index.multireader;import Org.apache.lucene.index.term;import Org.apache.lucene.queryparser.classic.parseexception;import Org.apache.lucene.queryparser.classic.QueryParser; Import Org.apache.lucene.search.booleanquery;import Org.apache.lucene.search.cachingwrapperfilter;import Org.apache.lucene.search.filter;import Org.apache.lucene.search.indexsearcher;import Org.apache.lucene.search.matchalldocsquery;import Org.apache.lucene.search.numericrangequery;import Org.apache.lucene.search.query;import Org.apache.lucene.search.querywrapperfilter;import Org.apache.lucene.search. Scoredoc;import Org.apache.lucene.search.sort;import Org.apache.lucene.search.sortfield;import Org.apache.lucene.search.termquery;import Org.apache.lucene.search.topdocs;import Org.apache.lucene.search.booleanclause.occur;import Org.apache.lucene.search.grouping.BlockGroupingCollector; Import Org.apache.lucene.search.grouping.groupdocs;import Org.apache.lucene.search.grouping.GroupingSearch; Import Org.apache.lucene.search.grouping.topgroups;import Org.apache.lucene.search.highlight.highlighter;import Org.apache.lucene.search.highlight.invalidtokenoffsetsexception;import Org.apache.lucene.search.highlight.queryscorer;import Org.apache.lucene.search.highlight.SimpleFragmenter; Import Org.apache.lucene.search.highlight.simplehtmlformatter;import Org.apache.lucene.store.fsdirectory;import Org.apache.lucene.util.bytesref;/**lucene Index Query Tool class * @author Lenovo * */public class Searchutil {/** Get Indexsearcher Object * @ Param Indexpath * @param service * @return * @throws ioexception */public static IndexSearcher Getindexsearcherbyparentpath (String parentpath,executorservice service) throws Ioexception{multireader reader = null;//set try {file[] files = new File (Parentpath). Listfiles (); indexreader[] readers = new Indexreader[files.leng th];for (int i = 0; i < files.length; i + +) {Readers[i] = Directoryreader.open (Fsdirectory.open (Paths.get (files[i].ge Tpath (), new String[0])));} reader = new Multireader (readers);} catch (IOException e) {//TODO auto-generated catch Blocke.printstacktrace ();} return new Indexsearcher (Reader,service);} /** Multi-threaded query * @param parentpath Parent Index directory * @param service multi-threaded query * @return * @throws ioexception */public static Indexsearche R Getmultisearcher (String Parentpath,executorservice service) throws Ioexception{file file = new file (Parentpath); file[] files = file.listfiles (); indexreader[] readers = new Indexreader[files.length];for (int i = 0; i < files.length ; i + +) {Readers[i] = Directoryreader.open (Fsdirectory.open (Paths.get (Files[i].getpath (), new string[0])));} Multireader Multireader = new Multireader (readers); Indexsearcher searcher = new Indexsearcher (multireader,service); return searcher;} /** Get Indexreader * @param indexpath * @return * @throws ioexception */public static Directoryreader Getindexreader (S) According to the index path Tring Indexpath) throws Ioexception{return Directoryreader.open (Fsdirectory.open (Paths.get (Indexpath, new String[0]) ));} /** Get Indexsearcher * @param indexpath * @param service * @return * @throws ioexception */public static indexsearcher based on index path Getindexsearcherbyindexpath (String Indexpath,executorservice service) throws Ioexception{indexreader reader = Getindexreader (Indexpath); return new Indexsearcher (Reader,service);} /** if the index directory is changed this method takes up a new indexsearcher this way consumes less resources * @param oldsearcher * @param service * @return * @throws IOException */pub Lic static Indexsearcher getindexsearcheropenifchanged (Indexsearcher oldsearcher,executorservice service) throws Ioexception{directoryreader reader = (directoryreader) oldsearcher.getindexreader ();DIrectoryreader Newreader = directoryreader.openifchanged (reader); return new Indexsearcher (Newreader, service);} /** multi-conditional query similar to SQL in * @param querys * @return */public static query Getmultiquerylikesqlin (query ... querys) {booleanquery qu ery = new Booleanquery (); for (Query Subquery:querys) {query.add (subquery,occur.should);} return query;} /** multi-conditional query similar to SQL and * @param querys * @return */public static query Getmultiquerylikesqland (query ... querys) {booleanquery query = new Booleanquery (); for (Query Subquery:querys) {query.add (subquery,occur.must);} return query;} /** query from a specified configuration item * @return * @param analyzer Word breaker * @param field field * @param fieldtype field type * @param querystr query criteria * @param rang e whether interval query * @return */public static query Getquery (String field,string fieldtype,string Querystr,boolean range) {Query q = n Ull;try {if (querystr! = null &&! "). Equals (QUERYSTR)) {if (range) {string[] STRs = Querystr.split ("\\|"); if ("int". Equals (FieldType)) {int min = new integer (strs[0]); int max = new Integer (StRS[1]); q = numericrangequery.newintrange (field, Min, Max, True, true);} else if ("Double". Equals (FieldType)) {Double min = new double (strs[0]);D ouble max = new double (strs[1]); q = Numericrangeque Ry.newdoublerange (field, Min, Max, True, true);} else if ("float". Equals (FieldType)) {float min = new float (strs[0]); float max = new float (strs[1]); q = numericrangequery.newfloatrange (field, Min, Max, True, true);} else if ("Long". Equals (FieldType)) {Long min = new Long (strs[0]); Long max = new Long (strs[1]), q = Numericrangequery.newlongrange (field, Min, Max, True, True);}} Else{if ("int". Equals (FieldType)) {q = numericrangequery.newintrange (field, new Integer (QUERYSTR), new Integer ( QUERYSTR), True, true);} else if ("Double". Equals (FieldType)) {q = numericrangequery.newdoublerange (field, new double (QUERYSTR), new double ( QUERYSTR), True, true);} else if ("float". Equals (FieldType)) {q = numericrangequery.newfloatrange (field, new float (QUERYSTR), new float (querystr ), True, true);} Else{analyzer Analyzer = new StandarDanalyzer (); q = new Queryparser (field, analyzer). Parse (QUERYSTR);}} else{q= new Matchalldocsquery ();} SYSTEM.OUT.PRINTLN (q);} catch (ParseException e) {//TODO auto-generated catch Blocke.printstacktrace ();} return q;} /** gets the corresponding contents according to field and value * @param fieldName * @param fieldvalue * @return */public static Query getquery (String fieldname,objec T fieldvalue) {Term term = new term (fieldName, new Bytesref (Fieldvalue.tostring ())); return new Termquery (term);} /** get the default document * @param searcher * @param docID * @return * @throws ioexception */public static DOC according to Indexsearcher and DocID Ument getdefaultfulldocument (Indexsearcher searcher,int DocID) throws Ioexception{return Searcher.doc (DocID);} /** according to Indexsearcher and DocID * @param searcher * @param docID * @param listfield * @return * @throws ioexception */public Stat IC Document Getdocumentbylistfield (indexsearcher searcher,int docid,set<string> ListField) throws IOException{ Return Searcher.doc (DocID, ListField);} /** Paged Query * @param page Current page * @param perpagE shows the number of bars per page * @param Searcher Searcher Finder * @param query Query criteria * @return * @throws ioexception */public static Topdocs Getscor Edocsbyperpage (int page,int perpage,indexsearcher searcher,query Query) throws Ioexception{topdocs result = Null;if ( query = = null) {SYSTEM.OUT.PRINTLN ("query is null return null"); return null;} Scoredoc before = null;if (page! = 1) {Topdocs Docsbefore = searcher.search (query, (page-1) *perpage); scoredoc[] Scoredocs = docsbefore.scoredocs;if (Scoredocs.length > 0) {before = Scoredocs[scoredocs.length-1];}} result = Searcher.searchafter (before, query, perpage); return result;} public static Topdocs Getscoredocs (Indexsearcher searcher,query Query) throws Ioexception{topdocs docs = Searcher.search (Query, Getmaxdocid (searcher)); return docs;} /** Highlight field * @param searcher * @param field * @param keyword * @param pretag * @param posttag * @param fragmentsize * @retu RN * @throws IOException * @throws invalidtokenoffsetsexception */public static string[] Highlighter (indexsearcheR searcher,string field,string keyword,string Pretag, String posttag,int fragmentsize) throws IOException, Invalidtokenoffsetsexception{term term = new term ("content", New Bytesref ("Lucene")); Termquery termquery = new Termquery (term); Topdocs docs = Getscoredocs (searcher, termquery); Scoredoc[] hits = Docs.scoredocs; Queryscorer scorer = new Queryscorer (termquery); Simplehtmlformatter simplehtmlformatter = new Simplehtmlformatter (pretag,posttag);//Set the highlighted format <b>keyword</ B>, this is the default format highlighter highlighter = new highlighter (simplehtmlformatter,scorer); Highlighter.settextfragmenter (New Simplefragmenter (fragmentsize));//sets the number of characters per return Analyzer Analyzer = new Standardanal Yzer (); String[] result = new String[hits.length]; for (int i = 0; i < result.length; i++) {Document doc = Searcher.doc (hits[i].doc); Result[i] = HIGHLIGHTER.GETBESTFRAGM ENT (Analyzer, field, Doc.get (field));} return result;} /** Statistics DocuNumber of ment, this method is equivalent to matchalldocsquery query * @param searcher * @return */public static int getmaxdocid (Indexsearcher searcher) { Return Searcher.getindexreader (). Maxdoc ();} /**group query, for cases where the Group field has been segmented index * @param searcher * @param groupendquery * @param query * @param sort * @param withingro Upsort * @param groupoffset * @param topngroups * @param needsscores * @param docoffset * @param docspergroup * @param fil Lfields * @return * @throws ioexception */public static topgroups<bytesref> gettopgroupsbygroupterm (Indexsearcher Searcher,query groupendquery,query query,sort sort,sort withingroupsort,int groupoffset,int TopNGroups,boolean Needsscores,int Docoffset,int Docspergroup,boolean fillfields) throws ioexception{@SuppressWarnings ("deprecation") Filter Groupenddocs = new Cachingwrapperfilter (new Querywrapperfilter (groupendquery)); Blockgroupingcollector C = new Blockgroupingcollector (sort, groupoffset+topngroups, needsscores, GroupEndDocs); Searcher.search (query, c); @SuppressWarnings ("UncheckeD ") topgroups<bytesref> Groupsresult = (topgroups<bytesref>) c.gettopgroups (WithinGroupSort, GroupOffset , Docoffset, Docoffset+docspergroup, fillfields); return groupsresult;} /** General Group Query * @param searcher * @param query * @param groupfieldname * @param sort * @param maxcacherammb * @param pag E * @param perpage * @return * @throws ioexception */public static topgroups<bytesref> gettopgroups (Indexsearcher se Archer,query query,string groupfieldname,sort sort,double maxcacherammb,int page,int perPage) throws IOException{ Groupingsearch groupingsearch = new Groupingsearch (groupfieldname); Groupingsearch.setgroupsort (sort); Groupingsearch.setfillsortfields (True); GROUPINGSEARCH.SETCACHINGINMB (MAXCACHERAMMB, true); Groupingsearch.setallgroups (TRUE); topgroups<bytesref> result = Groupingsearch.search (Searcher,query, (page-1) *perpage, page*perpage); return Result;}}
Time is not early, write here first, tomorrow will upload the source of the
一步一步跟我学习lucene是对近期做lucene索引的总结,大家有问题的话联系本人的Q-Q: 891922381,同时本人新建Q-Q群:106570134(lucene,solr,netty,hadoop),如蒙加入,不胜感激,大家共同探讨,本人争取每日一博,希望大家持续关注,会带给大家惊喜的
One step at a pace with me learning Lucene (a)---lucene Search Group processing Group query