Lucene in action Chinese version-Chapter 1-contact Lucene

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.4.1 create an index
In this section, you will see a class named indexer and its four static methods. The common directory of the file system displays all files with the. txt extension. After the indexer is executed, a created Lucene index is left for its subsequent searcher (described in section 1.4.2.
We don't expect you to be familiar with the Lucene classes and methods used in the example. We will explain them immediately. Annotated Code After the list, we show you how to use indexer. If you feel that it is helpful to learn how to use indexer before seeing the code, you can directly jump to the usage discussion section after the code.

Use indexer to index text files
List 1.1 shows the indexer command line Program . It uses two parameters:
N path for storing Lucene Indexes
N contains the path of the text file to be indexed
List 1.1 indexer: The file system and the TXT file
/**
* This code was originally written
* Erik's Lucene intro java. Net article
*/
Public class indexer {
Public static void main (string [] ARGs) throws exception {
If (ARGs. length! = 2 ){
Throw new exception ("Usage: Java" + indexer. Class. getname ()
+ "<Index dir> <data dir> ");
}
File indexdir = new file (ARGs [0]);
File datadir = new file (ARGs [1]);
Long start = new data (). gettime ();
Int numindexed = index (indexdir, datadir );
Long end = new date (). gettime ();
System. Out. println ("Indexing" + numindexed + "files took"
+ (End-Start) + "milliseconds ");
}
// Open an index and start file directory traversal
Public static int index (File indexdir, file datadir)
Throws ioexception {
If (! Datadir. exists () |! Datadir. isdirectory ()){
Throw new ioexception (datadir
+ "Does not exist or is not a Directory ");
}
Indexwriter writer = new indexwriter (indexdir, ① create a Lucene Index
New standardanalyzer (), true );
Writer. setusecompoundfile (false );
Indexdirectory (writer, datadir );
Int numindexed = writer.doc count ();
Writer. Optimize ();
Writer. Close ();
Return numindexed;
}
// Recursive method that callitself when it finds a directory
Private Static void indexdirectory (indexwriter writer, file DIR)
Throws ioexception {
File [] files = dir. listfiles ();
For (INT I = 0; I <files. length; I ++ ){
File F = files;
If (F. isdirectory ()){
Indexdirectory (writer, f); ② Recursion
} Else if (f.getname().endswith().txt ")){
Indexfile (writer, F );
}
}
}
// Method to actually index file using Lucene
Private Static void indexfile (indexwriter writer, file F)
Throws ioexception {
If (F. ishidden () |! F. exists () |! F. Canread ()){
Return;
}
System. Out. println ("Indexing" + F. GetCanonicalPath ());
Document Doc = new document ();
Doc. Add (field. Text ("contents", new filereader (F); ③ index file
Content
Doc. Add (field. Keyword ("filename", F. getcannicalpath (); ④ Index
File Name
Writer. adddocument (DOC); ⑤ Add the clip to Lucene Index
}
}
Interestingly, most of the Code is the traversal of the execution directory (② ). Only four rows (③, ④, ⑤) in the (①) and indexfile methods of indexwriter use Lucene API-Valid 6 lines of code.
In this example, only the text file with the .txt extension is used to demonstrate Lucene usage and make it as simple as possible. In Chapter 7th, we will show you how to process non-text files, and we have developed a small ready-made framework to analyze and index documents in several common formats.
Run Indexer
In the command line, we run indexer for the local working directory that contains the Lucene source file. We make the files in the indexer/Lucene directory and save the Lucene index in the build/index directory.
% Java Lia. meetlucene. indexer build/index/Lucene
Indexing/Lucene/build/test/testdoc/test.txt
Indexing/Lucene/build/test/testdoc/test2.txt
Indexing/Lucene/build.txt
Indexing/Lucene/changes.txt
Indexing/Lucene/license.txt
Indexing/Lucene/readme.txt
Indexing/Lucene/src/JSP/readme.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Stemsunicode.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Test1251.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Testkoi8.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Testunicode.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/RN/
→ Wordsunicode.txt
Indexing/Lucene/todo.txt
Indexing 13 files took 2205 milliseconds
Indexerprint the name of the retrieved file. You can see that only the text file with the extension name .txt is displayed.
Note: If you run this program in the command line on Windows, you need to adjust the Directory and path delimiter of the command line. The Windows command line is Java build \ index C: \ Lucene.
After the index is complete, indexer outputs the number of files indexed by the index and the time it takes. Because the report time includes file directory traversal and indexing, you cannot use it as a formal performance measurement basis. In our example, each index file is very small, but it takes only 2 seconds to index these files.
The indexing speed is important. We will discuss it in chapter 2nd. However, search is usually more important.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene in action Chinese version-Chapter 1-contact Lucene

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene in action Chinese version-Chapter 1-contact Lucene

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support