1, what is Lucene,lucene can do

Source: Internet
Author: User

1. What is LuceneLucene is a full-text search framework, not an app product. So it doesn't work like http://www.baidu.com/ or Google Desktop, it just provides a tool to enable you to implement these products.   2. What Lucene can do To answer this question, first understand the nature of Lucene. In fact, Lucene is a very simple function, after all, you give it a number of strings, and then it provides you with a full-text search service, tell you where the keywords you want to search appear. Knowing the nature, you can imagine doing anything that fits this condition. You can index the news in the station, and make a database; You can index several fields of a table, so you don't have to worry about locking the table because of "%like%"; you can also write your own search engine ...  3, the performance of Lucene howHere are some test data, if you feel acceptable, then you can choose. Test One: 2.5 million records, 300M text, generate index around 380M, 800 threads under average processing time 300ms. Test Two: 37000 records, index database of two varchar fields, index file 2.6m,800 thread under average processing time 1.5ms.   4. Why is Lucene so fast ? Inverted Index:

The inverted index stems from the fact that a record needs to be found based on the value of the property . Each entry in this index table includes an attribute value and the address of each record that has that property value . Because the property value is not determined by the record, it is determined by the property value to determine the position of the record, and is therefore called an inverted index (inverted). A file with an inverted index is called an inverted index file (inverted file).

Inverted files (inverted index), indexed objects are documents or words in a collection of documents, and are used to store the words stored in a document or set of documents, which is the most commonly used indexing mechanism for a document or a collection of documents.

The key step of the search engine is to set up an inverted index , the inverted index is generally expressed as a keyword, and then its frequency (the number of occurrences), the location (in which article or page, and the date, author and other information), it is equivalent to the Internet on the hundreds of millions of pages of the page to do an index, It is like the catalogue and label of a book. Readers want to see which topic related chapters, directly according to the table of contents to find the relevant page. No more from the first page of the book to the last page, one page of the search.

For more details please see:

Http://www.cnblogs.com/raphael5200/p/5143687.html

http://blog.csdn.net/chichengit/article/details/9235157

compression algorithm:The LZ4 algorithm is also known as Realtime Compression algorithm, in the operating system (LINUX/FREEBSD), File system (OPENZFS), Big Data (Hadoop), search engine (LUCENE/SOLR), Database (Hbase) ... Can see its figure, can be said to be a very general algorithm. The most prominent part of the LZ4 is its compression /decompression speed.
Related articles:http://blog.csdn.net/zhangskd/article/details/17009111http://blog.csdn.net/zhangskd/article/details/17282895 two USD search:The two-dollar search algorithm finds specific elements in a sorted array, similar to the Key,value
first, the elements in the middle of the array are compared, and if they are the same, a pointer to this element is returned, indicating that it is found. If not, the function will continue to search for half the size that matches, and then continue. If the remaining array length is 0, then the function ends.  5. How Lucene worksthe services provided by Lucene actually consist of two parts: one in one out. The so-called entry is written, the source you provide (essentially a string) is written to the index or deleted from the index, so-called read out, that is, to provide users with full-text search services, so that users can locate the source by keyword  Write Processthe source string is first processed by the analyzer, including: participle, divided into words; remove stopword (optional). Add the required information from the source to each field in the document and index the field that needs to be indexed to store the fields that need to be stored. Writes an index to memory, which can be either memory or disk. read out processusers to provide search keywords, through the analyzer processing. Find the corresponding document for the Processed keyword search index. The user extracts the required field from the document that is found as needed.   Documentuser-supplied sources are records that can be a text file, a string, or a record of a database table, and so on. Once a record has been indexed, it is stored in the index file as a document. The user searches and is returned as a list of the document.   FieldA document can contain multiple fields of information, such as an article that can contain information fields such as title, Body, and last modified, which are stored in document by field. field has two properties to choose from: Storage and indexing. You can control whether the field is stored by storing properties, and you can control whether the field is indexed by indexed properties. This may seem a bit of crap, in fact it's important to have the right combination of these two properties    

1, what is Lucene,lucene can do

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.