Lucene-2.0 learning documents

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lucene is an open-source project organized by Apache to implement full-text search engines using Java.

Its function is very powerful, and its API is also very simple. In general, Lucene is used for creation.

Similar to searching and operating databases (a bit like), document can be viewed

A row of records in the database. field can be considered as a field in the database. Use Lucene

The current search engine is as simple as connecting to a database using JDBC.

Lucene2.0, which is widely used and introduced in the past 1.4.3 Not compatible.

Lucene2.0 is http://apache.justdn.org/lucene/java/

Let's take a look at an example to give us a rough idea of Lucene.

A JUnit test case: (To make the code clear and nice, we throw all exceptions)

A)This is an example of creating a file index.

PublicVoidTestindexhello ()ThrowsIoexception

{

Date date1 =NewDate ();

// Create a new write Tool

// The first parameter is the directory in which the index is created.

// The second parameter is to create a text analyzer. You can write a text analyzer by yourself using the standard parameter.

// If the third parameter is true, the c: \ index directory is cleared before the index is created.

Indexwriter writer =NewIndexwriter ("C: \ Index ",NewStandardanalyzer (),True);

// This is the data source folder.

File file =NewFile ("C: \ file ");

/**

* In this example, the file content in the C: \ file directory is indexed, and the file path is used as an attachment to the search content.

If(File. isdirectory ())

{

String [] filelist = file. List ();

For(IntI = 0; I <filelist. length; I ++)

{

// Create a new document, which can be viewed as a row of database records

Document Doc =NewDocument ();

File F =NewFile (file,

Filelist [I]);

Reader reader =NewBufferedreader (NewFilereader (f ));

Doc. Add (NewField ("file", Reader); // Add field for doument

Doc. Add (NewField ("path", F. getabsolutepath (), field. Store.Yes, Field. index.No));

Writer. adddocument (DOC );

}

Writer. Close (); // this step is required. Only in this way will the data be written into the index directory.

Date date2 =NewDate ();

System.Out. Println ("time used" + (date2.gettime ()-date1.gettime () + "millisecond ");

}

Note: It is time-consuming to create an index. Therefore, the final output will take a long time.

B) An example of full-text search using Indexes

PublicVoidHellosearch ()ThrowsIoexception, parseexception

{

Indexsearcher =NewIndexsearcher ("C: \ Index"); // same as indexwriter above

Queryparser =NewQueryparser ("file", // This is a word Divider

NewStandardanalyzer ());

Bufferedreader BR =NewBufferedreader (NewInputstreamreader (system.In));

Query query = queryparser. parse (Br. Readline (); // query is an abstract class.

Hits hits = indexsearcher. Search (query );

Document Doc =Null;

System.Out. Print ("Searching ................");

For(IntI = 0; I

{

Doc = hits.doc (I );

System.Out. Println ("content:" + Doc. Get ("file"); // note what is output here

System.Out. Println ("file path:" + Doc. Get ("path "));

}

The two examples above show that Lucene is relatively simple.

Run the two examples above. You may say how Doc. Get ("file"); the returned result is null. We will talk about it now.

The following describes how to create an index.

In fact, from the above example, we can see that document, indexwriter, and field are used to create an index.

The simplest step is:

First, create a new document, indexwriter, and field respectively.

Then add the field using doument. Add,

Then, use the indexwrtier. adddocument () method to add the document.

Finally, call indexwriter. the close () method disables the input index. This step is very important. Only the index of this method can be written into the index directory, which is ignored by many beginners.

Document does not have much to introduce. It can be regarded as a row of records in the database.

Field is important and complex:

Let's take a look at its constructor with five:

Field (String name, byte[] value, Field.Store store)

Field (String name, Reader reader)

Field (String name, Reader reader, Field.TermVector termVector)

Field (String name, String value, Field.Store store, Field.Index index)

Field (String name, String value, Field.Store store, Field.Index index, Field.TermVector termVector)

There are three internal classes in field: field. index, field. Store, field. termvector, and they are also used by constructors.

Note:termVector Is Lucene 1.4.It is not commonly used to provide a vector mechanism for Fuzzy queries. The default value is false, but it does not affect general queries.

Their different combinations play different roles in full-text search. Let's look at the following table:

Field. Index	Field. Store	Description
`TOKENIZED (` `Word Segmentation)`	`YES`	The title or content of an article (if the content is not too long) can be searched.
`TOKENIZED`	`NO`	The title or content of an article (the content can be very long) can also be viewed.
`NO`	`YES`	This cannot be searched. It is only an attachment to the searched content. Such as URL
`UN_TOKENIZED`	`YES/NO`	Not segmented. It is searched as a whole and cannot be searched.
`NO`	`NO`	No such usage

ForField (String name, Reader reader)

Field (String name, Reader reader, Field.TermVector termVector)

They are field. Index. tokenized and field. Store. No. This is why the content in the above example is null. Because it is indexed but not stored. If you want to see the content of the article, you can get it through the path of the Article. After all, the path of the article is searched out as an attachment to the search.In web development, we usually place big data in the database, not in the file system, or in the index directory, because the operation is too large, it will increase the burden on the server..

The following describes indexwriter:

It is an index writer, and its tasks are relatively simple:

1. Use adddocument () to add documents that are prepared to write the index

2. Call close () to write the index to the index directory.

Let's take a look at its constructor:

IndexWriter (Directory d, Analyzer a, boolean create)

(Unfinished)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene-2.0 learning documents

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene-2.0 learning documents

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support