Tutorial and summary of e.net development (1)
Lucene is a famous open-source search framework. For English, the word segmentation process is good, but for Japanese and Chinese, the word segmentation is a bottleneck for Chinese. I did not distinguish words by interval. It was almost a year before I came into contact with the Project. Because I liked C # very much, lucene has always been a concern. net, no time to view Lucene (Java) version. Of course, my java level really cannot be used to make everyone laugh.
First, I would like to introduce some of the methods for storing data. It keeps the data according to one of its index values and retrieves the index location directly during search, then, the specific value is obtained.
I will use the following statements. Like common dB operations, you need to know: new, delete, edit, update (Delete first, then update), merge, and optimize.
I will introduce it in detail if I have time below. Because it is open source, I will take the key Code Paste it out.
1: Create an index
First, create a directory to keep all files.
// Private directory = new ramdirectory (fsdirectory. getdirectory (@ tutility. staticiodatapath, true ));
// Private directory = new ramdirectory (@ tutility. staticiodatapath );
Private directory = fsdirectory. getdirectory (@ tutility. staticiodatapath. trimend ('/') + "/" + "wordtype", false); // true for the first time, false after
Note:
Both ramdirectory (cache) and fsdirectory can be used (note fsdirectory. if the create parameter of getdirectory is true, the existing index library files will be deleted. If necessary, you can use indexreader. indexexists () method judgment)
Open an existing index library from the specified directory.
Private directory = fsdirectory. getdirectory ("C:" Index ", false );
Load the index library into the memory to increase the search speed.
Private directory = new ramdirectory (fsdirectory. getdirectory (@ "C:" Index ", false ));
// Or
// Private directory = new ramdirectory (C: "Index ");
The second step is to create a analyzer that is used for word segmentation. There are rain marks for Chinese word segmentation, shooot, sharpictclas, and so on. There is also a huge JB, mainly relying on a huge word segmentation dictionary, if I have time in the future, I will say:
Private analyzer = new standardanalyzer ();
This is a standard word segmentation. It comes with Lucene.
Then, create an indexwriter, which is the same as xmlwriter, and write it below:
Indexwriter writer = new indexwriter (ufile, analyzer, true); // true for the first time, false for the next time
// Writer. setmaxfieldlength (1000 );
Writer. setmaxmergedocs (2); // Number of documents merged
// Writer. setmergefactor (1000); // merge factor
Writer. adddocument (getnewdocument (New String [] {initvalue, initvalue }));
Writer. Optimize ();
Writer. Close (); // save
Writer. setusecompoundfile (true); // Merge files
Getnewdocument is defined by myself. The prototype is as follows:
Private document getnewdocument (string [] ARGs ){
Document document = new document ();
Document. Add (new field ("hashkey", argS [0], field. Store. Yes, field. Index. un_tokenized ));
Document. Add (new field ("chinesename", argS [1], field. Store. Yes, field. Index. un_tokenized ));
Return document;
}
It is mainly to create a term, which is the smallest factor of the index and records this object.
It must be noted that indexwrite. the adddocument method does not detect duplicate values. If you need to update the values, first find the existing values (as described in the series tutorial), delete them, and add them.
The third parameter of indexwriter is that when you determine whether the segments file exists or not, true means creating a database or overwriting an existing database. False means appending to an existing database, if it is false but there is no segment or CFS file, an error is reported. Therefore, it is important to determine whether it is the first time: if (! Indexreader. indexexists (directory )){
// Initialization is required for the first time
// Makeinit (directory );
}
To create an index file, click here. For details, copy the specific class or method name and check it by yourself.
In the future, I will write all my experiences. And demo again! Great plan