Full-Text Search algorithm function implementation of search engine (Lucene based)

Source: Internet
Author: User

Before doing to the turntable network, I have publicly published non-full-text search code, the need for friends want to go to read my blog. This article mainly discusses how to carry out full-text search, because I spent a long time to design a new book: The point of view, the requirements of full-text search is still very high, so I spent a lot of time to study full-text search, you can first experience the following: click My search. No more nonsense, just on the code:

  PublicMap<string,object> articlesearchalgorithms (searchcondition condition,indexsearcher searcher)throwsparseexception, ioexception{Map<String,Object> map =NewHashmap<string,object>(); String[] Filedslist=condition.getfiledslist (); String KeyWord=Condition.getkeyword (); intCurrentpage=Condition.getcurrentpage (); intPagesize=condition.getpagesize (); String SortField=Condition.getsortfield (); BooleanIsasc=Condition.isdesc (); String sdate=condition.getsdate (); String eDate=condition.getedate (); String classify=condition.getclassify (); //filtering Terminator characterskeyword=Escapeexprspecialword (KeyWord); Booleanquery Q1=NewBooleanquery (); Booleanquery Q2=NewBooleanquery (); Booleanquery Booleanquery=NewBooleanquery ();//Boolean Query                          if(classify!=NULL&& (Classify.equals ("Guanzhi") | | Classify.equals ("opinion") | | Classify.equals ("Write")) {String typeId= "1";//Default Remarks                 if(Classify.equals ("Guanzhi") ) {typeId= "2"; }                 if(Classify.equals ("Opinion") ) {typeId= "3"; } Query termquery=NewTermquery (NewTerm ("TypeId", typeId));             Q1.add (Termquery,booleanclause.occur.must); }             if(sdate!=NULL&&edate!=NULL){//whether a range query is determined by these two parametersQuery Rangequery =NewTermrangequery ("Writingtime",NewBytesref (Sdate),NewBytesref (EDate),true,true);             Q1.add (Rangequery,booleanclause.occur.must); } Sort Sort=NewSort ();//SortSort.setsort (Sortfield.field_score); if(sortfield!=NULL) {Sort.setsort (NewSortField (SortField, SortField.Type.STRING, ISASC)); }                        intStart = (currentPage-1) *pageSize; intHM = Start +pageSize; Topfieldcollector Res= Topfieldcollector.create (SORT,HM,false,false,false,false); //Exact Match queryTerm t0=NewTerm (filedslist[1],keyword); Termquery Termquery=NewTermquery (t0);//two highly-matched queriesQ2.add (termquery,booleanclause.occur.should); //prefix matchingTerm t1=NewTerm (filedslist[1],keyword); Prefixquery Prefixquery=Newprefixquery (t1);                        Q2.add (prefixquery,booleanclause.occur.should); //phrase, similarity matching, suitable for the content of participle             for(inti=0;i<filedslist.length;i++) {//Multi-field term query algorithm                if(i!=1) {phrasequery phrasequery=NewPhrasequery (); Term TS0=NewTerm (Filedslist[i],keyword);                                        Phrasequery.add (TS0); Fuzzyquery Fquery=NewFuzzyquery (NewTerm (Filedslist[i],keyword), 2);//Final Similarity QueryQ2.add (phrasequery,booleanclause.occur.should); Q2.add (fquery,booleanclause.occur.should);//suffix similar to take out}} multifieldqueryparser queryparser=NewMultifieldqueryparser (Version.lucene_47,filedslist,analyzer);            Queryparser.setdefaultoperator (Queryparser.and_operator); Query Query=Queryparser.parse (KeyWord);                        Q2.add (query,booleanclause.occur.should); //must be logically judged, otherwise the result is different            if(q1!=NULL&& q1.tostring (). Length () >0) {booleanquery.add (q1,booleanclause.occur.must); }            if(q2!=NULL&& q2.tostring (). Length () >0) {booleanquery.add (q2,booleanclause.occur.must);            } searcher.search (Booleanquery, RES); LongAmount =res.gettotalhits ();topdocs TDs=Res.topdocs (Start, pageSize); Map.put ("Amount", amount); Map.put ("TDS", TDS); Map.put ("Query", Booleanquery); returnmap; }

Note: The search criteria for the above code (searchcondition) is the specific needs of the viewpoint network, you can make changes according to your own search conditions, it is also difficult to adapt to all readers.

 PublicMap<string, object> searcharticle (searchcondition condition)throwsexception{Map<String,Object> map =NewHashmap<string,object>(); List<Write> list=NewArraylist<write>(); Directoryreader Reader=Condition.getreader (); String URL=Condition.geturl (); BooleanIshighligth=condition.ishighlight (); String KeyWord=Condition.getkeyword (); Indexsearcher Searcher=Getsearcher (Reader,url); Try{Map<String,Object> output=articlesearchalgorithms (Condition,searcher); if(output==NULL) {Map.put ("Amount", 0L); Map.put ("Source",NULL); returnmap; } map.put ("Amount", Output.get ("Amount")); Topdocs TDs= (Topdocs) output.get ("TDs"); scoredoc[] SD=Tds.scoredocs; Query Query= (query) output.get ("Query");  for(inti = 0; i < sd.length; i++) {Document doc=Searcher.doc (Sd[i].doc); String ID= Doc.get ("id"); /**********************start************************* needs to be dealt with together ********************/String Temp=doc.get ("title"); String title=temp;//not highlighted by default                if(ishighligth) {//Highlight article titleHighlighter Highlightertitle =NewHighlighter (Simplehtmlformatter,Newqueryscorer (query)); Highlightertitle.settextfragmenter (NewSimplefragmenter (40));//Word lengthTokenstream ts = analyzer.tokenstream ("title",NewStringReader (temp)); Title=highlightertitle.getbestfragment (ts,temp); if(title==NULL) {title=temp.replace (KeyWord, "<span style= ' color:red ' >" +keyword+ "</span>");//Highlight Handle plugin bug, add this sentence to avoid}} String Temp1=htmlendecode.htmlencode (Doc.get ("content")); String content=TEMP1;//use your own encapsulated method to escape                                if(ishighligth) {//do highlight, contentHighlighter Highlightercontent =NewHighlighter (Simplehtmlformatter,Newqueryscorer (query)); Highlightercontent.settextfragmenter (NewSimplefragmenter (constant.highlight_content_length));//Word length//temp1=stringescapeutils.escapehtml (TEMP1);//escaping Chinese characters causes highlighting to failTokenstream ts1 = Analyzer.tokenstream ("Content",NewStringReader (TEMP1)); Content=highlightercontent.getbestfragment (TS1,TEMP1); if(content==NULL) {content=temp1.replace (KeyWord, "<span style= ' color:red ' >" +keyword+ "</span>");//Highlight Handle plugin bug, add this sentence to avoid//Assuming this happens, the other highlights will automaticallyContent=subcontent (content);//interception processingContent=htmlendecode.htmldecode (content);//HTML decodingContent=substringhtml.sub (content,constant.highlight_content_length); }                }                /*---------------------------------------the ever-changing data----------------------------*/Write Write=writedao.getarticle (Long.parselong (id)); if(write!=NULL) {Write.settitle (title);                                        Write.setcontent (content); Date Writingtime=Write.getwritingtime (); String Timegap=dateutil.dategap (Writingtime);//TimegapWrite.settimegap (TIMEGAP);                List.add (write); }            }                    }Catch(Exception e) {e.printstacktrace (); } map.put ("Source", list); returnmap; }

Note above, this is the specific search code, different application scenarios have different requirements, please follow your own requirements to encapsulate objects, query database, etc., code is not reserved, absolutely available.

If there is any doubt can add QQ group: 284205104 If the group is full of trouble to go to the turntable to find the latest group add can, thank you for your reading.

Full-Text Search algorithm function implementation of search engine (Lucene based)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.