Recently I have been studying the lucene.net application. I would like to introduce to you here that lucene.net is a high-performance full-text search engine and is free and open-source. It is almost suitable for any application that requires full-text search, especially for cross-platform applications, it is transplanted from Lucene in Java and has been widely used on the Java platform. Compared with traditional database retrieval, paie.net has the following features:
|
Lucene full-text index Engine |
Database |
Index |
Use full-text indexes to create reverse indexes for all data in the data source. |
For like queries, traditional data indexes cannot be used at all. The data needs to traverse records one by one for grep-type fuzzy match, which is more than an order of magnitude lower than the search speed with indexes. |
Matching Effect |
Match by term and implement the language analysis interface to support non-English languages such as Chinese. |
If you use like "% net %", the Netherlands will also be matched, Fuzzy match of multiple keywords: Using like "% Com % net %": cannot match xxx.net..xxx.com |
Matching degree |
There is a matching degree algorithm that puts the results with a higher degree of similarity at the top. |
No matching degree control: for example, if the net keyword appears once in a record, the result is the same. |
Result output |
A special algorithm is used to output the first 100 results with the highest matching degree, and the result set is read in a buffer-type small batch. |
Return all result sets. When there are many matching entries (such as tens of thousands), a large amount of memory is required to store these temporary result sets. |
Customization |
Using different language analysis interfaces, you can easily customize the index rules that meet application requirements (including Chinese support) |
Unable to customize because no interface or interface is complex |
Conclusion |
For fuzzy query applications with high load, the rules for fuzzy query are required. The index data volume is large. |
Low usage, simple fuzzy matching rules, or a small amount of data to be queried |
The biggest difference between full-text search and database query is that the most relevant first 100 results meet the needs of more than 98% users.
E.net is a full-text search Development Kit instead of a practical application. It already contains basic components such as intelligent word segmentation, highlight keywords, and index algorithms. You only need to perform simple secondary development on it, it can be applied to the actual project. Now, the search engine for this blog is based on e.net.
Related Resources:
Lucene. net search engine Library: http://sourceforge.net/projects/lucenedotnet/
Dotlucene a search engine Library: http://sourceforge.net/projects/dotlucene/
Nlucene-A. net search engine: http://sourceforge.net/projects/nlucene/