Many people know about lucene.net. I only recently read it well. Don't laugh at me and use history as a news. If you don't know lucence, let me say something first. Lucene knowledge mainly includes indexing, search, analyzer, and performance optimization. There is nothing to say about indexing and searching. Let's look at a few examples. You will be familiar with the round-trip and step-by-step experiments. Analyzer is the essence of lucence, which is divided into two parts: Word Segmentation and filtering. In addition, Chinese Word Segmentation is even more difficult. Program Lucene. net. analysis. cn. dll extracted in to achieve Chinese word segmentation. Who has the C # version of The ICTCLAS word segmentation tool of the Chinese Emy of sciences. Performance optimization is also very important, because if the file to be indexed is large, the index creation performance will be greatly reduced. You can adjust several parameters of indexwriter to optimize the index performance, you can also use indexwriter. optimize () method (this method mainly optimizes the query speed, but degrades the index performance ), in addition, multiple threads can be used to index different contents and store them in ramdirectory. Then, all memory indexes can be merged into fsdirectory, you can even let multiple servers process each part of the content separately, put the index results in a queue, and then a machine can read the index results queue and merge the index results.
The purpose of this example is to demonstrate the functions of lucene.net. you can introduce the full text of the .txt,.htm,.html file in the specified directory, and then query it. Because it takes a long time to create an index if the files in the directory to be indexed are very large, I used asynchronous programming in the example program, so that the interface thread is not blocked when the index is created.
[Content]
1. Let's take a look at a simple example.
Public void test1 ()
{
// Create a memory directory
Lucene. net. Store. ramdirectory ramdir = new Lucene. net. Store. ramdirectory ();
// Create an index writer
Indexwriter ramwriter = new indexwriter (ramdir, new chineseanalyzer (), true );
// The words to be indexed. This is equivalent to the files to be indexed.
String [] words = {"People's Republic of China", "People's Republic", "people", "Republic "};
// Cyclically array, create a document, add fields to the document, and add the document to the index writer
Document Doc = NULL;
For (INT I = 0; I <words. length; I ++)
{
Doc = new document ();
Doc. Add (field. Text ("contents", words [I]);
Ramwriter. adddocument (DOC );
}
// INDEX OPTIMIZATION
Ramwriter. Optimize ();
// Close the index reader. You must close the reader. Code Use try to include the master and disable the index writer in finally
Ramwriter. Close ();
// Construct an index searcher
Indexsearcher searcher = new indexsearcher (ramdir );
// Instantiate a query using queryparser. parse
Query query = queryparser. parse ("Chinese people", "contents", new chineseanalyzer ());
// Obtain the search result
Hits hits = searcher. Search (query );
// Determine whether any search result exists. Of course, you can traverse the result set and output it.
If (hits. Length ()! = 0)
MessageBox. Show ("");
Else
MessageBox. Show ("no ");
}
2. For more information, see the download code.
The downloaded file contains a doc folder containing four text files. You can try to index the Directory and search for keywords such as "people" and "China, can you see the search results? Simply put, the example program traverses a directory, finds all the text and webpage files, creates a Lucene document file, and indexes the Directory and content of the file, then add it to the indexer and create an index in the index subdirectory of the program execution directory. asynchronous delegation is used for this part of the call. When searching, it is to search for entries that match a keyword in the index directory.
Note]
1. After creating an index, you must call the close method of indexwriter. Otherwise, if the number of files in the directory to be indexed is less than minmergedocs, you cannot create an index.
2. Field. there are two overloaded versions of the static text method. If the second parameter is string, this field is both indexed and stored. If it is textreader, only indexes are not stored. This is clear, in addition, pay attention to the appropriate encoding format when building textreader. Otherwise, some files will be read out of Garbled text, and the indexes created will certainly be created in Garbled text.
Section]
In fact, everyone will learn Lucene. The key is that a search engine like Google and Baidu will be difficult. Well, this search engine is also an industry, so who is interested, take a good look at the technologies related to the search industry. Maybe you can start a business by doing this, right.
Do I have a Chinese version of "CE practice? Or other noon books about lucence.
Finally, I would like to discuss with you a question: in the long term, what technology is more promising for programmers to learn? I have been working on the Program for several years. I want to find a field for a better understanding and then become an industry expert. In this way, you will not be too tired. If you don't want to do anything, you will be too tired and cannot easily make achievements. I have listed several directions for your analysis. Thank you.
1. Linux + Oracle (Database Management)
2. Compilation and C underlying driver development (it is said that it is very simple, just a few instructions, and you will be proficient in learning for a year. Unlike. net, you have to keep following)
3. EC ++ and kjava embedded development (including mobile game and route Firmware Development)
4. Instant Messaging (network programming, including network game Server programming)
5. search industry (not familiar)
6. OA and workflow (you can develop a set of workflows without programming. You can drag and drop to achieve the electronic business processes of enterprises, such as infopath, OSS, formserver, and WF)
7... Net website development (large scope, too many things need to be mastered, there will be a lot of people, very few in-depth)
8. Streaming Media Development (in the 3G era, I don't know if this is useful)
[Reference]
Idior's e.net Series
Li Gang, Song Wei, and Qiu Zhe's "ajax + Lucene building a search engine"
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.