Lucene in action Chinese version-Chapter 1-contact Lucene

Source: Internet
Author: User
1.4.1 create an index
In this section, you will see a class named indexer and its four static methods. The common directory of the file system displays all files with the. txt extension. After the indexer is executed, a created Lucene index is left for its subsequent searcher (described in section 1.4.2.
We don't expect you to be familiar with the Lucene classes and methods used in the example. We will explain them immediately. Annotated Code After the list, we show you how to use indexer. If you feel that it is helpful to learn how to use indexer before seeing the code, you can directly jump to the usage discussion section after the code.

Use indexer to index text files
List 1.1 shows the indexer command line Program . It uses two parameters:
N path for storing Lucene Indexes
N contains the path of the text file to be indexed
List 1.1 indexer: The file system and the TXT file
/**
* This code was originally written
* Erik's Lucene intro java. Net article
*/
Public class indexer {
Public static void main (string [] ARGs) throws exception {
If (ARGs. length! = 2 ){
Throw new exception ("Usage: Java" + indexer. Class. getname ()
+ "<Index dir> <data dir> ");
}
File indexdir = new file (ARGs [0]);
File datadir = new file (ARGs [1]);
Long start = new data (). gettime ();
Int numindexed = index (indexdir, datadir );
Long end = new date (). gettime ();
System. Out. println ("Indexing" + numindexed + "files took"
+ (End-Start) + "milliseconds ");
}
// Open an index and start file directory traversal
Public static int index (File indexdir, file datadir)
Throws ioexception {
If (! Datadir. exists () |! Datadir. isdirectory ()){
Throw new ioexception (datadir
+ "Does not exist or is not a Directory ");
}
Indexwriter writer = new indexwriter (indexdir, ① create a Lucene Index
New standardanalyzer (), true );
Writer. setusecompoundfile (false );
Indexdirectory (writer, datadir );
Int numindexed = writer.doc count ();
Writer. Optimize ();
Writer. Close ();
Return numindexed;
}
// Recursive method that callitself when it finds a directory
Private Static void indexdirectory (indexwriter writer, file DIR)
Throws ioexception {
File [] files = dir. listfiles ();
For (INT I = 0; I <files. length; I ++ ){
File F = files;
If (F. isdirectory ()){
Indexdirectory (writer, f); ② Recursion
} Else if (f.getname().endswith().txt ")){
Indexfile (writer, F );
}
}
}
// Method to actually index file using Lucene
Private Static void indexfile (indexwriter writer, file F)
Throws ioexception {
If (F. ishidden () |! F. exists () |! F. Canread ()){
Return;
}
System. Out. println ("Indexing" + F. GetCanonicalPath ());
Document Doc = new document ();
Doc. Add (field. Text ("contents", new filereader (F); ③ index file
Content
Doc. Add (field. Keyword ("filename", F. getcannicalpath (); ④ Index
File Name
Writer. adddocument (DOC); ⑤ Add the clip to Lucene Index
}
}
Interestingly, most of the Code is the traversal of the execution directory (② ). Only four rows (③, ④, ⑤) in the (①) and indexfile methods of indexwriter use Lucene API-Valid 6 lines of code.
In this example, only the text file with the .txt extension is used to demonstrate Lucene usage and make it as simple as possible. In Chapter 7th, we will show you how to process non-text files, and we have developed a small ready-made framework to analyze and index documents in several common formats.
Run Indexer
In the command line, we run indexer for the local working directory that contains the Lucene source file. We make the files in the indexer/Lucene directory and save the Lucene index in the build/index directory.
% Java Lia. meetlucene. indexer build/index/Lucene
Indexing/Lucene/build/test/testdoc/test.txt
Indexing/Lucene/build/test/testdoc/test2.txt
Indexing/Lucene/build.txt
Indexing/Lucene/changes.txt
Indexing/Lucene/license.txt
Indexing/Lucene/readme.txt
Indexing/Lucene/src/JSP/readme.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Stemsunicode.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Test1251.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Testkoi8.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/ru/
→ Testunicode.txt
Indexing/Lucene/src/test/org/Apache/Lucene/analysis/RN/
→ Wordsunicode.txt
Indexing/Lucene/todo.txt
Indexing 13 files took 2205 milliseconds
Indexerprint the name of the retrieved file. You can see that only the text file with the extension name .txt is displayed.
Note: If you run this program in the command line on Windows, you need to adjust the Directory and path delimiter of the command line. The Windows command line is Java build \ index C: \ Lucene.
After the index is complete, indexer outputs the number of files indexed by the index and the time it takes. Because the report time includes file directory traversal and indexing, you cannot use it as a formal performance measurement basis. In our example, each index file is very small, but it takes only 2 seconds to index these files.
The indexing speed is important. We will discuss it in chapter 2nd. However, search is usually more important.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.