lucene4.10.3 Getting Started

Source: Internet
Author: User

Recently in the research search diligent full-text index, just have a point of look, found is very practical, here release the entry cheats, we share!

First, Lucene introduction

Lucene is a Java-based full-text information Retrieval toolkit, which is not a complete search application, but rather provides indexing and search capabilities for your application. Lucene is currently an open source project in the Apache Jakarta family. It is also the most popular open source full-Text Search toolkit based on Java.
There are already many applications that are based on Lucene, such as the search function of Eclipse's help system. Lucene can index text-type data, so you can index and search your documents as long as you are able to convert the text you want to index into the data format. For example, if you want to index some HTML documents, PDF documents, you first need to convert the HTML document and PDF document into text format, and then give the converted content to Lucene index, and then save the created index file to disk or memory, Finally, the query is made on the index file based on the query criteria entered by the user. Not specifying the format of the document to be indexed also allows Lucene to be applied to almost all search applications.

Let's take a look at the official architecture diagram first:


Two, website demo

1, first to the official website download Lucene-4.10.3.zip

Official website: http://lucene.apache.org/

2, unzip the zip package, one of the demo has a lucene-xml-query-demo.war can be placed in the Tomcat installation directory of the WebApps

3, the Tomcat server to open the input Localhost:8080/lucene-xml-query-demo will appear interface but click on the query will be reported Java.lang.classnotfoundexception:o Rg.apache.lucene.xmlparser.webdemo.FormBasedXmlQueryDemo this error. This is because the path to the Formbasedxmlquerydemo in the new version

Change, then you need to go to the project's Web. XML <servlet-class> Org.apache.lucene.xmlparser.webdemo.FormBasedXmlQueryDemo </ Servlet-class>

Change to <servlet-class>org.apache.lucene.demo.xmlparser.FormBasedXmlQueryDemo</servlet-class>

Then unpack the lucene-4.1.0, Analysis\common\lucene-analyzers-common-4.10.2.jar and Sandbox\lucene-sandbox-4.10.2.jar.

These two files are copied to the Web-inf\lib folder below, when you click on the query will not be the problem of input Java query results are as follows


three, simple implementation

The basic working principle of lucene can be understood as creating an index, and querying according to the index

Here's a simple example.

The Txtfileinderxer function is to index all the. txt files in the D:/lucenedata and store all the indexes in D:/luceneindex

Package Org.com.test;import Java.io.file;import java.io.filereader;import java.io.reader;import java.util.Date; Import Org.apache.lucene.analysis.analyzer;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.document.document;import Org.apache.lucene.document.field.store;import Org.apache.lucene.document.textfield;import Org.apache.lucene.index.indexwriter;import Org.apache.lucene.index.indexwriterconfig;import Org.apache.lucene.store.fsdirectory;import Org.apache.lucene.util.version;public class Txtfileindexer {public static void main (string[] args) throws Exception {//I Ndexdir is the directory of that hosts Lucene ' s index filesfile indexdir = new File ("D:\\luceneindex");//DataDir is the dire Ctory that hosts the text files, which to is indexedfile DataDir = new File ("D:\\lucenedata");//Analyzer Luceneanalyzer = n ew//StandardAnalyzer (version.lucene_4_10_2);//Word breaker for documents analyzer Luceneanalyzer = new StandardAnalyzer (); file[] datafiles = Datadir.listfiles (); IndExwriterconfig indexwriterconfig = new Indexwriterconfig (Version.lucene_4_10_3, Luceneanalyzer);//CREATE INDEX IndexWriter IndexWriter = new IndexWriter (Fsdirectory.open (Indexdir), indexwriterconfig);//IndexWriter IndexWriter = new IndexWriter (Indexdir, luceneanalyzer,//true); long startTime = new Date (). GetTime (); for (int i = 0; i < Datafiles.lengt H i++) {if (Datafiles[i].isfile () && datafiles[i].getname (). EndsWith (". txt")) {System.out.println ("indexing File "+ Datafiles[i].getcanonicalpath ());//Package Document Object Document document = new document (); Reader Txtreader = new FileReader (datafiles[i]);d Ocument.add (New TextField ("Path", Datafiles[i].getcanonicalpath (), Store.yes));//Document.add (Field.text ("Contents", Txtreader));d Ocument.add (New TextField ("Contents", Txtreader)); Indexwriter.adddocument (document);}} Indexwriter.commit ();//Indexwriter.optimize (); Indexwriter.close (); Long endTime = new Date (). GetTime (); System.out.println ("It takes" + (Endtime-starttime) + "milliseconds to CreaTe index for the files in directory "+ Datadir.getpath ());}} 




The Txtfilesearcher function is to read the index from the D:/luceneindex and query the. txt file that contains lucene files

Package Org.com.test;import Java.io.file;import Org.apache.lucene.document.document;import Org.apache.lucene.index.indexreader;import Org.apache.lucene.index.term;import Org.apache.lucene.search.indexsearcher;import Org.apache.lucene.search.scoredoc;import Org.apache.lucene.search.termquery;import Org.apache.lucene.search.topdocs;import Org.apache.lucene.store.directory;import Org.apache.lucene.store.fsdirectory;public class TxtFileSearcher {public static void Main (string[] args) throws Exception {String querystr = "Java";//This is the directory that hosts the Lucene Indexfile Indexdir = new File ("D:\\luceneindex");D irectory Directory = Fsdirectory.open (Indexdir);//Fsdirectory Fsdirectory = Fsdirectory.open (indexdir); Indexreader indexreader = indexreader.open (directory);//Fsdirectory Directory = Fsdirectory.getdirectory (indexdir,false);//Indexreader Indexreader = Indexreader.open (FSDirectory); I Ndexsearcher indexsearcher = new Indexsearcher (indexreader);//Indexsearcher searcher = nEW Indexsearcher (Indexreader), if (!indexdir.exists ()) {System.out.println ("The Lucene index is not exist"); Term term = new term ("contents", Querystr.tolowercase ()); Termquery lucenequery = new Termquery (term);//Hits Hits = Searcher.search (lucenequery); Topdocs Topdocs = Indexsearcher.search (lucenequery, 1000); scoredoc[] Scoredocs = topdocs.scoredocs;if (Scoredocs = = NULL | | scoredocs.length = = 0) {System.out.println ("the Lucene Index is not exist "); for (int i = 0; i < scoredocs.length; i++) {Document document = Indexsearcher.doc (Scoredocs[i].doc); System.out.println ("File:" + document.get ("path")); Indexreader.close ();}}

The results of the implementation are as follows:

To index a data source with "D:\\lucenedata":


Search results by "Java" for keywords:



Summarize:

In our gradual maturity, to Daniel's technical career, there has been a phenomenon, we frown on the results of N days and nights, a cow minute to find a jar package to achieve our function, and relatively perfect, often encountered this phenomenon, I will think of "Xunzi an exhortation" in the famous:

I tasted all day and thought, as a moment of learning also, I tasted go and hope, than ascend the Bo see also. Ascend and recruit, arm non-length also, and see Far, Shin, sound not add disease also, and smell. The false-Zhou Ji, the non-profit-footed, are thousands of miles away, and the false, the non-energy water, and the river. A gentleman is born non-dissimilar also, good false in things also.

Thousands of years ago, the ancients, has been educating us, as a gentleman, should learn to stand on the shoulders of giants, the human invention of the text, is for posterity can inherit, has inherited, we have thousands of years of civilization to shape the incomparable development speed, and this speed has been refreshed! program, it is not so!!!

lucene4.10.3 Getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.