. Net-based full-text index engine Lucene. net

Source: Internet
Author: User

Recently I have been studying the lucene.net application. I would like to introduce to you here that lucene.net is a high-performance full-text search engine and is free and open-source. It is almost suitable for any application that requires full-text search, especially for cross-platform applications, it is transplanted from Lucene in Java and has been widely used on the Java platform. Compared with traditional database retrieval, paie.net has the following features:

  Lucene full-text index Engine Database
Index Use full-text indexes to create reverse indexes for all data in the data source. For like queries, traditional data indexes cannot be used at all. The data needs to traverse records one by one for grep-type fuzzy match, which is more than an order of magnitude lower than the search speed with indexes.
Matching Effect Match by term and implement the language analysis interface to support non-English languages such as Chinese. If you use like "% net %", the Netherlands will also be matched,
Fuzzy match of multiple keywords: Using like "% Com % net %": cannot match xxx.net..xxx.com
Matching degree There is a matching degree algorithm that puts the results with a higher degree of similarity at the top. No matching degree control: for example, if the net keyword appears once in a record, the result is the same.
Result output A special algorithm is used to output the first 100 results with the highest matching degree, and the result set is read in a buffer-type small batch. Return all result sets. When there are many matching entries (such as tens of thousands), a large amount of memory is required to store these temporary result sets.
Customization Using different language analysis interfaces, you can easily customize the index rules that meet application requirements (including Chinese support) Unable to customize because no interface or interface is complex
Conclusion For fuzzy query applications with high load, the rules for fuzzy query are required. The index data volume is large. Low usage, simple fuzzy matching rules, or a small amount of data to be queried

The biggest difference between full-text search and database query is that the most relevant first 100 results meet the needs of more than 98% users.

E.net is a full-text search Development Kit instead of a practical application. It already contains basic components such as intelligent word segmentation, highlight keywords, and index algorithms. You only need to perform simple secondary development on it, it can be applied to the actual project. Now, the search engine for this blog is based on e.net.

Related Resources:

Lucene. net search engine Library: http://sourceforge.net/projects/lucenedotnet/
Dotlucene a search engine Library: http://sourceforge.net/projects/dotlucene/
Nlucene-A. net search engine: http://sourceforge.net/projects/nlucene/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.