Dotlucene: 37 lines of code full-text search

Source: Internet
Author: User
Title Dotlucene: 37 rowsCodeFull-text searchSelect blog from Shanyou
Keywords Dotlucene: 37 lines of code full-text search
Source
Dotlucene is a powerful open-source full-text search engine, which is transplanted from Apache Lucene (Java) project to. Net (C.

Dotlucene is highly efficient and has the features of rating search results, highlights, non-structured search data, and localization. It is also compatible with Lucene indexes, so you can migrate between different platforms without losing any index data. This article describes how to use dotlucene to complete full-text search using concise code. This article is translated from codeproject Dotlucene of Dan letecky: full-text search for your intranet or website using 37 lines of code, Article Copyright is owned by the original author. Translator: Samuel Chen
    • This articleSource codeDownload-363 KB
    • Dotlucene online demonstration
    • Download source code that contains pre-generated indexes and HTML documents [LINK]
Dotlucene: Can a good full-text search engine use 37 lines of code to write a full-text search? Well, I am preparing to use dotlucene to complete this troublesome task. dotlucene is a transplantation project of the Jakarta Lucene search engine maintained by George aroush et al. The following are some of its features:
    • It can be used in ASP. NET, winforms, or console applications;
    • Very efficient performance;
    • Search result rating;
    • Highlights of search keywords in search results;
    • Searches for structured and unstructured data;
    • Metadata search (Time query, search for specified domain/field ...)
    • The index size is about 30% of the index text;
    • And can store all indexed documents such as can store also full indexed documents
    • Pure. Net-hosted code, single execution file (244 KB)
    • Very friendly license (Apache Software License 2.0)
    • Localization (support for Brazilian, Czech, Chinese, Dutch, English, French, Japanese, Korean and Russian)
    • Scalable (source code included)
Be careful not to care too much about the number of lines of code. I will use up to 37 lines of code to demonstrate his core functions to you. But to make a practical application, you still need to spend more time... demonstration Project here, we will do a simple project demonstration how to do the following:
    • Index the HTML files found in the specified directory (including subdirectories)
    • Use an ASP. NET applicationProgramTo search for Indexes
    • Highlight the search words in the search results.
Dotlucene also has more potential. In actual application, you probably want to do this:
    • How to add a new document to the index in the directory instead of re-compiling the entire index
    • Contains various file types. Dotluncene can index any file type that can be converted to plain text.
Why not use Microsoft Index Server )? If you like to use the index service, no problem. However, using dotlucene has more advantages:
    • Dotlucene is a 100%-managed code single-execution file without any dependency.
    • It can be used on a shared host. If you have prepared an index, you do not need the write permission on the disk.
    • Using it, You can index any type of data (email, XML, HTML file...) from any source (Database, website ...). That's because you need to provide plain text to the indexer. Loading and parsing depend on you.
    • Allows you to select the specified attribute ("field") to be included in the index, so that you can use these fields for search (for example, author, date, keyword, etc)
    • It is an open-source software.
    • It is easy to expand
Row 1st: Create an index. The code below creates a new index from the storage disk. Directory is the directory path parameter for storing the index.
 
Indexwriter writer = new indexwriter (directory, new standardanalyzer (), true );
In this example, we always re-create an index (in this example we always create the index from scratch), but this is not necessary. You can also open an existing index and add the document to it. You can also delete and add their new versions to update the existing documents (note: this should be the creation of the object) row 2nd-12: to add a document, we add two fields to the index for each HTML document:
    • The text field contains the text content (excluding tags) of HTML files. Text data is not stored in the index.
    • The "path" field contains the file path, which will be fully stored in the index.
Public void addhtmldocument (string path) {document DOC = new document (); string rawtext; using (streamreader sr = new streamreader (path, system. text. encoding. default) {rawtext = parsehtml (Sr. readtoend ();} Doc. add (field. unstored ("text", rawtext); Doc. add (field. keyword ("path", PATH); writer. adddocument (DOC );}
Lines 13th-14: After optimizing and saving the index, You need to disable the index. Optimization improves search performance. Writer. Optimize ();

Writer. Close ();
Row 15th: Enable index search before any search, you need to enable the index. The DIRECTORY parameter is the directory path for storing the index.
 
Indexsearcher searcher = new indexsearcher (directory );
Row 16th-27: Search now, resolution query ("text" is the default search field)
 
Query query = queryparser. parse (Q, "text", new standardanalyzer (); hits = searcher. Search (query );
The variable hits is a collection of search results documents. We will store the results to the datatable
Datatable dt = new datatable (); DT. columns. add ("path", typeof (string); DT. columns. add ("sample", typeof (string); For (INT I = 0; I  
Line 28th-37: highlighted lines 28-37: Query highlighting we will first create a highlighted object highlighter and use the bold) font to highlight (<B> query term </B> ).
 
Queryhighlightextractor highlighter = new queryhighlightextractor (query, new standardanalyzer (), "<B>", "</B> ");
By traversing the results, we will load the most similar parts in the original text.
For (INT I = 0; I  
Resources
    • Dotlucene download
    • Run dotlucene Online Demo
    • Dotlucene Online Demo notes
    • Dotlucene documentation


Author's blog:Http://blog.csdn.net/shanyou/Related Articles

dotlucene: 37 full-text search lines of code
Regular Expression experiences
anyone can refactor-methods and reasons for using eclipse to automatically refactor features
testdriven. Net (aka nunitaddin)
mono l.0 was officially released

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.