Simple Lucene instance

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When I write an article, I feel that it is difficult to write the title. Sometimes I don't know what the name is. Anyway, I am writing some simple examples about Lucene.

Lucene is actually very simple. It mainly involves two things: Creating indexes and searching.
Let's take a look at some of the terms used in Lucene. I am not going to introduce them in detail here, but just click here-because there is a good thing in the world called search.

Indexwriter: One of the most important classes in Lucene. It is mainly used to add documents to the index and control the use of some parameters during the index process.

Analyzer: analyzer, mainly used to analyze various text encountered by search engines. Commonly used include standardanalyzer, stopanalyzer, and whitespaceanalyzer.

Directory: the location where the index is stored. Lucene provides two types of index storage locations: disk and memory. Generally, indexes are stored on disks. Correspondingly, Lucene provides two classes: fsdirectory and ramdirectory.

Document: Document is equivalent to a unit for indexing. any file that can be indexed must be converted to a document object for indexing.

Field: field.

Indexsearcher: it is the most basic search tool in Lucene. indexsearcher is used for all searches;

Query: Query. Lucene supports fuzzy query, semantic query, phrase query, and combined query, for example, termquery, booleanquery, rangequery, and wildcardquery.

Queryparser: a tool used to parse user input. You can scan user input strings to generate query objects.

Hits: After the search is complete, the search result must be returned and displayed to the user. Only in this way can the search be completed. In Lucene, the set of search results is represented by instances of the hits class.

I have explained a lot of terms above. Let's take a look at some simple examples:
1. Simple standardanalyzer test example

Java code

PackageLighter.javaeye.com;
ImportJava. Io. ioexception;
ImportJava. Io. stringreader;
ImportOrg. Apache. Lucene. analysis. analyzer;
ImportOrg. Apache. Lucene. analysis. Token;
ImportOrg. Apache. Lucene. analysis. tokenstream;
ImportOrg. Apache. Lucene. analysis. Standard. standardanalyzer;
Public ClassStandardanalyzertest
{
// Constructor,
PublicStandardanalyzertest ()
{
}
Public Static VoidMain (string [] ARGs)
{
// Generate a standardanalyzer object
Analyzer aanalyzer =NewStandardanalyzer ();
// Test string
Stringreader sr =NewStringreader ("lighter javaeye COM is the are on ");
// Generate a tokenstream object
Tokenstream Ts = aanalyzer. tokenstream ("name", Sr );
Try{
IntI = 0;
Token T = ts. Next ();
While(T! =Null)
{
// Displays the row number in the secondary output.
I ++;
// Output the processed characters
System. Out. println ("th" + I + "row:" + T. termtext ());
// Get the next character
T = ts. Next ();
}
}Catch(Ioexception e ){
E. printstacktrace ();
}
}
}

Package lighter.javaeye.com; import Java. io. ioexception; import Java. io. stringreader; import Org. apache. lucene. analysis. analyzer; import Org. apache. lucene. analysis. token; import Org. apache. lucene. analysis. tokenstream; import Org. apache. lucene. analysis. standard. standardanalyzer; public class standardanalyzertest {// constructor, public standardanalyzertest () {} public static void main (string [] ARGs) {// generate a standardan Alyzer object analyzer aanalyzer = new standardanalyzer (); // test string stringreader sr = new stringreader ("lighter javaeye COM is the are on "); // generate tokenstream object tokenstream Ts = aanalyzer. tokenstream ("name", Sr); try {int I = 0; token T = ts. next (); While (T! = NULL) {// The row number I ++ is displayed in the secondary output; // The output processed character system. out. println ("row" + I + ":" + T. termtext (); // get the next character T = ts. next () ;}} catch (ioexception e) {e. printstacktrace ();}}}

Display result:

Reference row 1st: lighter
Row 3: javaeye
Row 3: COM

Tip:
Standardanalyzer is a built-in "Standard analyzer" in Lucene. It can provide the following functions:
1. The original sentence is segmented by Space
2. All uppercase letters can be converted to lowercase letters.
3. Some useless words, such as "is", "the", "are", and all punctuation marks can be deleted.
Check the result and make a clear comparison with "new stringreader (" lighter javaeye COM is the are on.
The API is not explained here. For details, see the official Lucene documentation. Note that the code here uses the release E2 API, which is significantly different from version 1.43.

2. look at another instance and create an index to search

Java code

PackageLighter.javaeye.com;
ImportOrg. Apache. Lucene. analysis. Standard. standardanalyzer;
ImportOrg.apache.e.doc ument. Document;
ImportOrg.apache.e.doc ument. field;
ImportOrg. Apache. Lucene. Index. indexwriter;
ImportOrg. Apache. Lucene. queryparser. queryparser;
ImportOrg. Apache. Lucene. Search. Hits;
ImportOrg. Apache. Lucene. Search. indexsearcher;
ImportOrg. Apache. Lucene. Search. query;
ImportOrg. Apache. Lucene. Store. fsdirectory;
Public ClassFsdirectorytest {
// Index Creation Path
Public Static FinalString Path = "C: // index2 ";
Public Static VoidMain (string [] ARGs)ThrowsException {
Document doc1 =NewDocument ();
Doc1.add (NewField ("name", "lighter javaeye com", field. Store. Yes, field. Index. tokenized ));
Document doc2 =NewDocument ();
Doc2.add (NewField ("name", "lighter blog", field. Store. Yes, field. Index. tokenized ));
Indexwriter writer =NewIndexwriter (fsdirectory. getdirectory (path,True),NewStandardanalyzer (),True);
Writer. setmaxfieldlength (3 );
Writer. adddocument (doc1 );
Writer. setmaxfieldlength (3 );
Writer. adddocument (doc2 );
Writer. Close ();
Indexsearcher searcher =NewIndexsearcher (PATH );
Hits hits =Null;
Query query =Null;
Queryparser QP =NewQueryparser ("name ",NewStandardanalyzer ());
Query = QP. parse ("lighter ");
Hits = searcher. Search (query );
System. Out. println ("Search/" lighter/"Total" + hits. Length () + "result ");
Query = QP. parse ("javaeye ");
Hits = searcher. Search (query );
System. Out. println ("Search/" javaeye/"Total" + hits. Length () + "result ");
}
}

Package lighter.javaeye.com; import Org. apache. lucene. analysis. standard. standardanalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import Org. apache. lucene. index. indexwriter; import Org. apache. lucene. queryparser. queryparser; import Org. apache. lucene. search. hits; import Org. apache. lucene. search. indexsearcher; import Org. apache. lucene. search. query; import Org. apache. lucene. store. fsdirectory; public class fsdirectorytest {// index Creation Path public static final string Path = "C: // index2"; public static void main (string [] ARGs) throws exception {document doc1 = new document (); doc1.add (new field ("name", "lighter javaeye com", field. store. yes, field. index. tokenized); document doc2 = new document (); doc2.add (new field ("name", "lighter blog", field. store. yes, field. index. tokenized); indexwriter writer = new indexwriter (fsdirectory. getdirectory (path, true), new standardanalyzer (), true); writer. setmaxfieldlength (3); writer. adddocument (doc1); writer. setmaxfieldlength (3); writer. adddocument (doc2); writer. close (); indexsearcher searcher = new indexsearcher (PATH); hits = NULL; query = NULL; queryparser QP = new queryparser ("name", new standardanalyzer ()); query = QP. parse ("lighter"); hits = searcher. search (query); system. out. println ("Search/" lighter/"Total" + hits. length () + "result"); query = QP. parse ("javaeye"); hits = searcher. search (query); system. out. println ("Search/" javaeye/"Total" + hits. length () + "result ");}}

Running result:

Java code

Search for two results for "lighter"
Search for one result in javaeye

Search for "lighter", 2 results in total, 1 result in javaeye.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Simple Lucene instance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Simple Lucene instance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support