Lucene introduction and learning summary of its working methods

Source: Internet
Author: User
Lucene is a full-text search framework, rather than an application product. It can be easily embedded into various applications to implement full-text indexing/Retrieval for applications. Therefore, it is not as useful as www.baidu.com or Google Desktop. It only provides a tool for you to implement these products.
Lucene is a sub-project of the 4 Jakarta Project Team of the Apache Software Foundation. It is an open Source code Is not a complete full-text search engine, but a full-text search engine architecture, provides a complete query engine and index engine, some text analysis engines (two Western languages: English and German ). Lucene aims to provide software developers with a simple and easy-to-use toolkit to conveniently implement full-text retrieval in the target system, or build a complete full-text retrieval engine based on this.
As an open source Code After the release of Lucene, it triggered a huge response from the open source community, Program They not only use it to build specific full-text retrieval applications, but also integrate it into various system software and build Web applications, even some commercial software uses Lucene as the core of its internal full-text retrieval subsystem.
Many Java projects have used Lucene as the full-text index engine in their background. The famous ones are: Jive: web forum system; eyebrows: HTML archiving, browsing, and querying; cocoon: XML-based Web Publishing framework. Lucene is used for full-text search. The Apache Software Foundation website uses Lucene as the full-text search engine; in eclipse 2.1, IBM's open-source software also uses Lucene as the full-text index engine for the help sub-system. Lucene is also used in IBM's Business Software Web sphere.
Lucene has gained more and more applications with its open source code features, excellent index structure, and good system architecture. Lucene is a high-performance and scalable information search (IR) Library. It allows you to add indexing and search capabilities for your applications. Lucene is a mature and free open-source project implemented in Java. It is a member of the famous Apache Jakarta family and is licensed Based on the Apache Software [ASF, license]. Similarly, Lucene is a free Java information search (IR) library that is very popular in recent years.
The service provided by Lucene consists of two parts: one in and one out. The so-called "inbound" refers to writing or removing the source (essentially a string) You provide from the index. The so-called "outbound" refers to providing full-text search services to users, allows you to locate the source using keywords.
Write process: the source string is first processed by analyzer, including word segmentation, divided into words, and stopword removal (optional ). Add the required information in the source to each field of the document, index the field to be indexed, and store the field to be stored. Write the index into the memory. The memory can be memory or disk.
Read process: the user provides search keywords, which are processed by analyzer. Find the corresponding document for the processed keyword search index. The user extracts the required field from the document as needed.
Index process: Read the file name (multiple) from the command line, store the file path (path field) and content (body field) fields, and perform full-text indexing on the content: the index unit is the document object. Each document object contains multiple field objects. For different field attributes and data output requirements, you can also select different index/storage field rules for fields.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.