Lucene-2.0 learning documents

Source: Internet
Author: User

Lucene is an open-source project organized by Apache to implement full-text search engines using Java.

Its function is very powerful, and its API is also very simple. In general, Lucene is used for creation.

Similar to searching and operating databases (a bit like), document can be viewed

A row of records in the database. field can be considered as a field in the database. Use Lucene

The current search engine is as simple as connecting to a database using JDBC.

Lucene2.0, which is widely used and introduced in the past 1.4.3 Not compatible.

 

Lucene2.0 is http://apache.justdn.org/lucene/java/

 

Let's take a look at an example to give us a rough idea of Lucene.

A JUnit test case: (To make the code clear and nice, we throw all exceptions)

A)This is an example of creating a file index.

PublicVoidTestindexhello ()ThrowsIoexception

{

Date date1 =NewDate ();

// Create a new write Tool

// The first parameter is the directory in which the index is created.

// The second parameter is to create a text analyzer. You can write a text analyzer by yourself using the standard parameter.

// If the third parameter is true, the c: \ index directory is cleared before the index is created.

Indexwriter writer =NewIndexwriter ("C: \ Index ",NewStandardanalyzer (),True);

// This is the data source folder.

File file =NewFile ("C: \ file ");

/**

* In this example, the file content in the C: \ file directory is indexed, and the file path is used as an attachment to the search content.

*/

If(File. isdirectory ())

{

String [] filelist = file. List ();

For(IntI = 0; I <filelist. length; I ++)

{

// Create a new document, which can be viewed as a row of database records

Document Doc =NewDocument ();

File F =NewFile (file,

Filelist [I]);

Reader reader =NewBufferedreader (NewFilereader (f ));

Doc. Add (NewField ("file", Reader); // Add field for doument

Doc. Add (NewField ("path", F. getabsolutepath (), field. Store.Yes, Field. index.No));

Writer. adddocument (DOC );

}

}

Writer. Close (); // this step is required. Only in this way will the data be written into the index directory.

Date date2 =NewDate ();

System.Out. Println ("time used" + (date2.gettime ()-date1.gettime () + "millisecond ");

}

Note: It is time-consuming to create an index. Therefore, the final output will take a long time.

B) An example of full-text search using Indexes

PublicVoidHellosearch ()ThrowsIoexception, parseexception

{

Indexsearcher =NewIndexsearcher ("C: \ Index"); // same as indexwriter above

Queryparser =NewQueryparser ("file", // This is a word Divider

NewStandardanalyzer ());

Bufferedreader BR =NewBufferedreader (NewInputstreamreader (system.In));

Query query = queryparser. parse (Br. Readline (); // query is an abstract class.

Hits hits = indexsearcher. Search (query );

Document Doc =Null;

System.Out. Print ("Searching ................");

For(IntI = 0; I

{

Doc = hits.doc (I );

System.Out. Println ("content:" + Doc. Get ("file"); // note what is output here

System.Out. Println ("file path:" + Doc. Get ("path "));

}

}

The two examples above show that Lucene is relatively simple.

Run the two examples above. You may say how Doc. Get ("file"); the returned result is null. We will talk about it now.

The following describes how to create an index.

In fact, from the above example, we can see that document, indexwriter, and field are used to create an index.

The simplest step is:

First, create a new document, indexwriter, and field respectively.

Then add the field using doument. Add,

Then, use the indexwrtier. adddocument () method to add the document.

Finally, call indexwriter. the close () method disables the input index. This step is very important. Only the index of this method can be written into the index directory, which is ignored by many beginners.

Document does not have much to introduce. It can be regarded as a row of records in the database.

Field is important and complex:

Let's take a look at its constructor with five:

Field (String name, byte[] value, Field.Store store)

Field (String name, Reader reader)

Field (String name, Reader reader, Field.TermVector termVector)

Field (String name, String value, Field.Store store, Field.Index index)

Field (String name, String value, Field.Store store, Field.Index index, Field.TermVector termVector)

There are three internal classes in field: field. index, field. Store, field. termvector, and they are also used by constructors.

Note:termVector Is Lucene 1.4.It is not commonly used to provide a vector mechanism for Fuzzy queries. The default value is false, but it does not affect general queries.

Their different combinations play different roles in full-text search. Let's look at the following table:

Field. Index

Field. Store

Description

TOKENIZED ( Word Segmentation)

YES

The title or content of an article (if the content is not too long) can be searched.

TOKENIZED

NO

The title or content of an article (the content can be very long) can also be viewed.

NO

YES

This cannot be searched. It is only an attachment to the searched content. Such as URL

UN_TOKENIZED

YES/NO

Not segmented. It is searched as a whole and cannot be searched.

NO

NO

No such usage

ForField (String name, Reader reader)

Field (String name, Reader reader, Field.TermVector termVector)

They are field. Index. tokenized and field. Store. No. This is why the content in the above example is null. Because it is indexed but not stored. If you want to see the content of the article, you can get it through the path of the Article. After all, the path of the article is searched out as an attachment to the search.In web development, we usually place big data in the database, not in the file system, or in the index directory, because the operation is too large, it will increase the burden on the server..

The following describes indexwriter:

It is an index writer, and its tasks are relatively simple:

1. Use adddocument () to add documents that are prepared to write the index

2. Call close () to write the index to the index directory.

Let's take a look at its constructor:

IndexWriter (Directory d, Analyzer a, boolean create)

(Unfinished)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.