Tutorial and summary of e.net development (1)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tutorial and summary of e.net development (1)

Lucene is a famous open-source search framework. For English, the word segmentation process is good, but for Japanese and Chinese, the word segmentation is a bottleneck for Chinese. I did not distinguish words by interval. It was almost a year before I came into contact with the Project. Because I liked C # very much, lucene has always been a concern. net, no time to view Lucene (Java) version. Of course, my java level really cannot be used to make everyone laugh.

First, I would like to introduce some of the methods for storing data. It keeps the data according to one of its index values and retrieves the index location directly during search, then, the specific value is obtained.

I will use the following statements. Like common dB operations, you need to know: new, delete, edit, update (Delete first, then update), merge, and optimize.

I will introduce it in detail if I have time below. Because it is open source, I will take the key Code Paste it out.

1: Create an index
First, create a directory to keep all files.
// Private directory = new ramdirectory (fsdirectory. getdirectory (@ tutility. staticiodatapath, true ));
// Private directory = new ramdirectory (@ tutility. staticiodatapath );
Private directory = fsdirectory. getdirectory (@ tutility. staticiodatapath. trimend ('/') + "/" + "wordtype", false); // true for the first time, false after

Note:
Both ramdirectory (cache) and fsdirectory can be used (note fsdirectory. if the create parameter of getdirectory is true, the existing index library files will be deleted. If necessary, you can use indexreader. indexexists () method judgment)

Open an existing index library from the specified directory.
Private directory = fsdirectory. getdirectory ("C:" Index ", false );

Load the index library into the memory to increase the search speed.
Private directory = new ramdirectory (fsdirectory. getdirectory (@ "C:" Index ", false ));
// Or
// Private directory = new ramdirectory (C: "Index ");

The second step is to create a analyzer that is used for word segmentation. There are rain marks for Chinese word segmentation, shooot, sharpictclas, and so on. There is also a huge JB, mainly relying on a huge word segmentation dictionary, if I have time in the future, I will say:

Private analyzer = new standardanalyzer ();

This is a standard word segmentation. It comes with Lucene.

Then, create an indexwriter, which is the same as xmlwriter, and write it below:
Indexwriter writer = new indexwriter (ufile, analyzer, true); // true for the first time, false for the next time
// Writer. setmaxfieldlength (1000 );
Writer. setmaxmergedocs (2); // Number of documents merged
// Writer. setmergefactor (1000); // merge factor
Writer. adddocument (getnewdocument (New String [] {initvalue, initvalue }));
Writer. Optimize ();
Writer. Close (); // save
Writer. setusecompoundfile (true); // Merge files

Getnewdocument is defined by myself. The prototype is as follows:

Private document getnewdocument (string [] ARGs ){
Document document = new document ();
Document. Add (new field ("hashkey", argS [0], field. Store. Yes, field. Index. un_tokenized ));
Document. Add (new field ("chinesename", argS [1], field. Store. Yes, field. Index. un_tokenized ));
Return document;
}

It is mainly to create a term, which is the smallest factor of the index and records this object.

It must be noted that indexwrite. the adddocument method does not detect duplicate values. If you need to update the values, first find the existing values (as described in the series tutorial), delete them, and add them.

The third parameter of indexwriter is that when you determine whether the segments file exists or not, true means creating a database or overwriting an existing database. False means appending to an existing database, if it is false but there is no segment or CFS file, an error is reported. Therefore, it is important to determine whether it is the first time: if (! Indexreader. indexexists (directory )){
// Initialization is required for the first time
// Makeinit (directory );
}

To create an index file, click here. For details, copy the specific class or method name and check it by yourself.

In the future, I will write all my experiences. And demo again! Great plan

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tutorial and summary of e.net development (1)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support