Tutorial and summary of e.net development (1)

Source: Internet
Author: User
Tutorial and summary of e.net development (1)

Lucene is a famous open-source search framework. For English, the word segmentation process is good, but for Japanese and Chinese, the word segmentation is a bottleneck for Chinese. I did not distinguish words by interval. It was almost a year before I came into contact with the Project. Because I liked C # very much, lucene has always been a concern. net, no time to view Lucene (Java) version. Of course, my java level really cannot be used to make everyone laugh.

First, I would like to introduce some of the methods for storing data. It keeps the data according to one of its index values and retrieves the index location directly during search, then, the specific value is obtained.

I will use the following statements. Like common dB operations, you need to know: new, delete, edit, update (Delete first, then update), merge, and optimize.

I will introduce it in detail if I have time below. Because it is open source, I will take the key Code Paste it out.

1: Create an index
First, create a directory to keep all files.
// Private directory = new ramdirectory (fsdirectory. getdirectory (@ tutility. staticiodatapath, true ));
// Private directory = new ramdirectory (@ tutility. staticiodatapath );
Private directory = fsdirectory. getdirectory (@ tutility. staticiodatapath. trimend ('/') + "/" + "wordtype", false); // true for the first time, false after

Note:
Both ramdirectory (cache) and fsdirectory can be used (note fsdirectory. if the create parameter of getdirectory is true, the existing index library files will be deleted. If necessary, you can use indexreader. indexexists () method judgment)

Open an existing index library from the specified directory.
Private directory = fsdirectory. getdirectory ("C:" Index ", false );

Load the index library into the memory to increase the search speed.
Private directory = new ramdirectory (fsdirectory. getdirectory (@ "C:" Index ", false ));
// Or
// Private directory = new ramdirectory (C: "Index ");

The second step is to create a analyzer that is used for word segmentation. There are rain marks for Chinese word segmentation, shooot, sharpictclas, and so on. There is also a huge JB, mainly relying on a huge word segmentation dictionary, if I have time in the future, I will say:

Private analyzer = new standardanalyzer ();

This is a standard word segmentation. It comes with Lucene.

Then, create an indexwriter, which is the same as xmlwriter, and write it below:
Indexwriter writer = new indexwriter (ufile, analyzer, true); // true for the first time, false for the next time
// Writer. setmaxfieldlength (1000 );
Writer. setmaxmergedocs (2); // Number of documents merged
// Writer. setmergefactor (1000); // merge factor
Writer. adddocument (getnewdocument (New String [] {initvalue, initvalue }));
Writer. Optimize ();
Writer. Close (); // save
Writer. setusecompoundfile (true); // Merge files

Getnewdocument is defined by myself. The prototype is as follows:

Private document getnewdocument (string [] ARGs ){
Document document = new document ();
Document. Add (new field ("hashkey", argS [0], field. Store. Yes, field. Index. un_tokenized ));
Document. Add (new field ("chinesename", argS [1], field. Store. Yes, field. Index. un_tokenized ));
Return document;
}

It is mainly to create a term, which is the smallest factor of the index and records this object.
 
It must be noted that indexwrite. the adddocument method does not detect duplicate values. If you need to update the values, first find the existing values (as described in the series tutorial), delete them, and add them.

The third parameter of indexwriter is that when you determine whether the segments file exists or not, true means creating a database or overwriting an existing database. False means appending to an existing database, if it is false but there is no segment or CFS file, an error is reported. Therefore, it is important to determine whether it is the first time: if (! Indexreader. indexexists (directory )){
// Initialization is required for the first time
// Makeinit (directory );
}

To create an index file, click here. For details, copy the specific class or method name and check it by yourself.

In the future, I will write all my experiences. And demo again! Great plan

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.