Lucene 4.X Full Course

Source: Internet
Author: User

Http://www.cnblogs.com/forfuture1978/category/300665.html

Lucene 4.X Inverted index principle and implementation: (3) term Dictionary and index file (fst detailed Analysis) summary: We look at the most complex part, is the terms dictionary The suffix named Tim,term index file for the dictionary file is tip, format. The term dictionary file is first a header, followed by the postingsheader, the two formats ... Read the full text posted @ 2014-08-29 21:14  Liu Sukei Read First (7180) |  comments (2)   Edit Lucene 4.X Inverted index principle and implementation: (1) Summary of the design of the Dictionary: The information stored in the dictionary format design Dictionary is mainly three parts: the statistical information of term string term, such as the position information of document frequency (documents Frequency) inverted table where the term string preservation is a big problem, according to the basic principles of the previous chapter, We know that the term written in the file is ordered in the order of the dictionary, so how to put these orderly ter ... Read the full text posted @ 2014-08-28 10:23  Liu Sukei Read First (10542) |  comments (4)   Edit Lucene application development uncover the Summary: Lucene Application development Secret Training network Address: Http://www.hztraining.com/bbs/showtopic-1954.aspxChinaPub address: http://product.china-pub.com/ 3502099&ref=xiliegoumai Note: As I am the first to record such videos, imperfections please understand that this series of courses mainly include three parts, one is the principle of the search engine, the second is the deep analysis of Lucene and advanced features, third, real-time search for the framework of distributed search and code implementation of the principle of the search engine parts, theoretical explanations are more, some of the content is less, in the integrity of the need, or as a separate section, may be short video time, good video is sub-section sales, I have read the full posted @  2011-09-10 00:01  Liu Sukei Read First (7022) |  comments (6)   Edit Linkedin's real-time search engine Zoie summary: first, The overall architecture Zoie is the LinkedIn company based on LuceNE implementation of the real-time search engine system, according to the description of its official Wiki: Http://snaprojects.jira.com/wiki/display/ZOIE/Overview Zoie is a realtime indexing and Search system, and as such needs to has relatively close coupling between the logically distinct indexing and searching s Ubsystems:as soon as a document m read the full text posted @ 2010-11-29 21:19  Liu Sukei read (10273) |  review   edit questions about Lucene Question (8): A document update to build a real-time index with Lucene summary: in the question about Lucene (7), the problem of building a real-time index using Lucene memory index and hard disk index is Discussed. Some readers, however, mention how to build a real-time index when it comes to deleting and updating Documents. This section to discuss this Issue. 1. Lucene Delete documents in several ways indexreader.deletedocument (int docID) is deleted by document number with Indexreader. Indexreader.deletedocuments (term) is a document that uses Indexreader to delete the word that contains it. Indexwriter.deletedocuments (term) is to delete a package with Indexwriter. Read the full text posted @ 2010-06-27 14:17  Liu Sukei Read First (11138) |  comments (7)   Edit Lucene principles and Code Analysis full version summary: Lucene The principle and Code Analysis series of articles have been basically over, and there may be new updates to the Issue. The full PDF can be downloaded from the link below. Lucene principles and Code Analysis the full version of the directory is as Follows: directory first: principle Chapter one: The basic principle of the Full-text search the first, the general two, the index what to save what three, how to create the index the first step: some of the original document to Index. Step Two: Pass the original document to the sub-component (tokenizer). Step three: Pass the resulting word element (Token) to the language processing component (linguistic ProCessor). Fourth Step: pass the resulting word (term) to the index component (Indexer). 1. Use the resulting word (term) to create a dictionary. 2. Sort the dictionaries in alphabetical order. 3. Combine the same words (t Read full text posted @ 2010-06-13 01:52  Liu Sukei First read (40677) |  comments (+)   Edit questions about Lucene (7) : Build a real-time index summary with Lucene: because Lucene's transactional, as described in the previous chapter, allows Lucene to add a segment incrementally, we know that the inverted index is in a certain format, and once the write is very hard to change, how can you incrementally build the index? Lucene uses the concept of segments to solve this problem, for each created segment, its inverted index structure will no longer change, and the increment of the added document added to the new segment, between the segments at a certain time merge, resulting in a new inverted index structure. however, because Lucene's transactional, so that Lucene index is not real-time, if you want Lucene real-time, you must add a new document after the IndexWriter need commit, in the search Indexreader need to reopen, however, when the index is on the hard disk, especially when the index is very large read the full text posted @ 2010-06-08 01:59  Liu Sukei Read First (9874) |  comments (8)   Edit questions about Lucene (6) : A transactional summary of lucene: the so-called transactional, multi-fingered database attribute, including acid four basic elements: atomicity (atomicity), Consistency (consistency), isolation (isolation), Persistence (durability). We mainly discuss the isolation, Lucene Indexreader and IndexWriter are Isolated. When Indexreader.open opens an index, any subsequent changes will not be seen, relative to the snapshot of the current Index. It is only possible to see changes to the index since the last time Indexreader.open opened an Index. When IndexWriter does not invoke commit, its modified content is not visible, even if ... Read the full text posted @ 2010-06-07 01:39  Liu Sukei Read First (4875) |  comments (0)   Edit Lucene Learning summary ten: Lucene's word breaker analyZer abstract: 1, Abstract Class Analyzer its main contains two interfaces for the generation of Tokenstream:tokenstream Tokenstream (String fieldName, Reader reader); Tokenstream reusabletokenstream (String fieldName, Reader reader); The so-called tokenstream, later we will talk about, is a word after the token result composed of a stream, can continue to get the next partition Token. In order to improve performance, so that in the same thread no longer need to generate new Tokenstream objects, The old can be reused, so there are Reusabletokenstream said ... Read the full text posted @ 2010-06-06 22:14  Liu Sukei Read First (25690) |  comments (0)   Edit Lucene Learning summary: Lucene's Query Object summary: Lucene Learning Summary of the Nine: Lucene query object (1) http://www.cnblogs.com/forfuture1978/archive/2010/05/19/1738803. Htmllucene Study Summary of the Nine: Lucene query object (2) http://www.cnblogs.com/forfuture1978/archive/2010/05/19/1738804. Htmllucene Study Summary of the Nine: Lucene query object (3) http://www.cnblogs.com/forfuture1978/archive/2010/05/19/1738805. Html---------------read Full text posted @ 2010-05-19 02:35  Liu Sukei Read First (3540) |  comments (1)   Editing Lucene Learning Summary nine: lucene query Object (1) summary: lucene In addition to supporting query syntax, you can also construct a query object to Search. From the previous section of the Lucene syntax chapter can be known that the query can be corresponding to the query object Has: booleanquery,fuzzyquery,matchalldocsquery,multitermquery,multiphrasequery, phrasequery,prefixquery,termrangequery,Termquery,wildcardquery. Lucene also supports some query objects that do not have query statements corresponding to them, but can achieve relatively advanced functionality, This section mainly discusses these advanced query Objects. Some of the most important hierarchies are as follows, and we will parse them All. Query B. Read the full text posted @ 2010-05-19 02:29  Liu Sukei Read First (7515) |  comments (3)   Edit Lucene Learning summary: Lucene Query Object (2) summary: 5, Spanquery So-called spanquery is also in the query process to take into account the position of the term information query Object. The most basic of the spanquery is spantermquery, which contains only one term, unlike termquery, which provides a function to get location information: public Spans Getspans (final Indexreader Reader) throws IOException {return new Termspans (reader.termpositions (term), term);} Spans has the following methods: next () gets the next document number, different spanquery this side ... Read the full text posted @ 2010-05-19 02:29  Liu Sukei Read First (2160) |  comments (0)   Edit Lucene Learning summary: Lucene Query Object (3) summary: 6, Filteredquery Filteredquery contains two member Variables: query query: querying object filter Filter: It has a function docidset getdocidset (indexreader reader) Get a collection of document numbers, the resulting document must be from this collection of documents, note that the filter contains the document number is not to filter out the document number, but the required document number after Filtering. Filterquery The result set is the same as the two fetch and query, but when scoring, Filterquery only consider the part of the query, regardless of the filter. Filter contains many of the following: 6.1, Termsfilter it contains a member variable ... Read the full text posted @ 2010-05-19 02:29  Liu Sukei Read First (4781) |  comments (0)   Edit thereIssues with Lucene (5): Toomanyclause exception in Lucene summary: Why this exception is generated: if Rangequery,prefixquery,wildcardquery is used during the Lucene retrieval process, Fuzzyquery these four types of query may produce toomanyclauses Exceptions. Why does this anomaly occur? Example: in rangequery, for example, if the date range is 19990101 to 20091231, in the index file there are 19990102,19990103 and so on these date phrases, then Rangequery will be expanded to "19990102 OR 19990103 ", 2 Clauses. It can be imagined that if there are many dates in the index file within this time period, many clauses will be generated. Prefixquery and so is the same reading posted @ 2010-05-16 00:29  Liu Sukei Read First (1189) |  comments (2)   Edit Lucene Learning Summary eight: lucene query syntax, javacc and queryparser (2) summary: three, Analysis queryparser.jj3.1, Declares the Queryparser class in the QUERYPARSER.JJ file, between Parser_begin (queryparser) and Parser_end (queryparser), which defines the Queryparser class. One of the most important functions is the public query parse (String Query) function, which is the function that we call when we parse the Lucene query syntax. This is a purely Java code-defined function that is copied directly to the Queryparser.java File. The most important line of code in the parse function is to call query res = toplevelquery (field), while TOPLEVELQ reads the full text posted @ 2010-05-08 00:21  Liu Sukei Read First (5179) |  comments (0)   Edit Lucene Learning summary eight: lucene query syntax, javacc and Queryparser (1) summary: first, lucene query Syntax lucene supported query syntax visible/http/ lucene.apache.org/java/3_0_1/queryparsersyntax.html (1) Syntax keyword +-&& || ! () {} [] ^ "~ *?: \ If the query word you want to query contains keywords, you need to escape with \ (2) Query term Lucene supports two query terms, one is a single query word, such as" Hello ", and one is a phrase (phrase), such as" hello wo Rld ". (3) query field (field) in a query statement, you can specify from which domain to look for Read full text posted @ 2010-05-08 00:20  Liu Sukei First read (9283) |  comments (0)   Editorial Introduction to Information retrieval (translation) : Chapter One Boolean search (1) summary: information retrieval The meaning of the word is very wide. Simply removing the credit card from your wallet and entering your credit card number is also part of the information retrieval Category. however, from an academic standpoint, information retrieval is defined as follows: information retrieval is the process of finding documents that meet your needs from a large number of unstructured document Sets. As defined above, information retrieval was once only a handful of book managers, lawyers, and Professional searchers involved in Activities. today, Tens of thousands of people use search engines to search for Web pages and mails every day. Information retrieval is rapidly replacing the traditional method of database search and becomes the main way of information Acquisition. In addition, the information retrieval technology can also solve other related data and information problems. Unstructured data refers to data that does not have a clear semantic structure that can be understood by a computer. In contrast, structured data, such as traditional relational databases, are used by many companies to store product inventories and employees to read the full text posted @ 2010-05-01 20:57  Liu Sukei First read (2550) |  comments (0)   Editing Lucene Learning Summary seven: lucene search Process Analysis Summary: This series of articles will describe in detail the basic Principles and Code analysis of almost the latest version of LUCENE. The overall schema and index file format is Lucene 2.9, and the index process analysis is Lucene 3.0. Since the index file format has not changed much, so the original text is not updated, the principles and architecture of the article refers to some of the predecessors of the diagram, may belong to the early lucene, but does not affect the understanding of the principle and architecture. This series of articles is still in writing, there will be Java CC, Word breakers, queryparser, query statements and query objects and other Chapters. Lucene Learning Summary seven: lucene search Process Analysis (1) http://www.cnblogs.com/forfuture1978/archive/2010/04/04/1704242.html Read the full text posted @  2010-04-04 18:42  Liu Sukei Read (5271) |  comments (4)   Edit Lucene Learning Summary seven: lucene search Process Analysis (6) summary: 2.4, Search Query Object 2.4.4, Collect document result set and calculate score In the function Indexsearcher.search (Weight, Filter, int), There is the following code: topscoredoccollector collector = Topscoredoccollector.create ( Ndocs,!weight.scoresdocsoutoforder ()); Search (weight, filter, collector); return Collector.topdocs (); 2.4.4.1, Creating a result document collector TopScoreDocCollector collector ... Read the full text posted @ 2010-04-04 18:12  Liu Sukei Read First (3613) |  comments (0)   Edit Lucene Learning Summary seven: lucene search Process Analysis (5) summary: 2.4, Search query Object 2.4.3, Inverted table merge after getting the scorer object tree and Sumscorer object tree, It is the process of merging the inverted table and scoring Calculation. The merge inverted table is analyzed in this section, and the calculation of the scorer object tree for scoring is analyzed in the next Section. The Booleanscorer2.score (Collector) code is as follows: public void score (Collector Collector) throws IOException {collector.setscorer ( this); While (doc = Countingsumscorer.nextdoc ())! = No_more ... Read the full text posted @ 2010-04-04 18:05  Liu Sukei Read First (5246) |  comments (1)   Edit Lucene Learning Summary seven: lucene search Process Analysis (4) summary: 2.4, Search Query Object 2.4.1.2, Create weight object Tree Booleanquery.createweight (Searcher) eventually returns return new Booleanweight (Searcher), BooleanweiThe concrete implementation of the Ght constructor is as follows: public booleanweight (Searcher Searcher) {this.similarity = getsimilarity (Searcher); weights = new Arrayl Ist<weight> (clauses.size ()); is also a recursive process, along the new query object Tree ... Read the full text posted @ 2010-04-04 17:53  Liu Sukei Read First (3547) |  comments (3)   Edit Lucene Learning Summary seven: lucene search Process Analysis (3) summary: 2.3, Queryparser Parse query statement generated query object code: queryparser parser = new Queryparser (version.lucene_current, "contents", New StandardAnalyzer (version.lucene_current)); Query query = parser.parse ("+ (+apple*-boy) (cat* dog)-(eat~ foods)"), This process is relatively complex, involving javacc,queryparser, word breakers, query syntax, etc., This chapter will not be discussed in detail , one by one read the full text in later chapters posted @ 2010-04-04 17:40  Liu Sukei Read First (4670) |  comments (3)   Edit Lucene Learning Summary seven: lucene search Process Analysis (2) Abstract: second, Lucene Search detailed process in order to parse the Lucene index file Search process, pre-written index of the following files: file01.txt:apple apples cat dog file02.txt:apple boy cat category File03.txt:apply Dog Eat etc file04.txt:apply Cat Foods 2.1, Open Indexreader point to index folder code: indexreader reader = Indexreader. Open (fsdirectory.open (indexdir)); is actually called Directoryreader.open read the full text posted @ 2010-04-04 17:31  Liu Sukei Read First (5258) |  comments (0)   Edit Lucene Learning Summary seven: lucene search Process Analysis (1) summary: first, Lucene search process in general The process of searching is basically to read the dictionary and the inverted table information from the index, merge the inverted table according to the user input query statement, Get the result document set and the process of scoring the Document. It can be as follows: the total includes the following procedures: Indexreader Open the index file, read and open the stream that points to the index File. The user input query statement converts the query statement into a Query object tree construct Weight object tree, which is used to calculate the weight of term Weight, which is the part of the calculation scoring formula that is related only to the document that is relevant to the search statement (the Red part). Constructs a scorer object tree for calculating scores (termscorer.score ()). In the process of constructing the scorer object tree, its leaf nodes ... Read the full text posted @ 2010-04-04 17:27  Liu Sukei Read First (5896) |  comments (1)   Edit Lucene Learning summary six: The mathematical deduction of Lucene scoring Formula: Before the analysis of the Lucene search process, it is necessary to separate the derivation of the Lucene score formula, the meaning of each part of the Explanation. Because of the Lucene search process, One important step is to gradually calculate the fractions of each part. Lucene's scoring formula is very complex, as follows: before derivation, the meaning of each part is first described: t:term, the term here refers to contains the domain information of the words, that is, Title:hello and Content:hello is different from the coord (q,d) : A search may contain multiple search terms, and a document may contain more than one search term, which means that when a document contains more search terms, the document is scored higher. Querynorm (q): calculates the variance of each query entry and, ... Read the full text posted @ 2010-03-07 00:18  Liu Sukei Read First (9135) |  comments (9)   Edit Lucene Learning summary: Lucene segment Merge (merge) Process Analysis Summary Segment merging process in general IndexWriter the member variables related to segment merging are: hashset<segmentinfo> mergingsegments = new Hashset<segmentinFo> (); Saves the segment that is being merged to prevent the merge from being checked again during the Merge. Mergepolicy mergepolicy = new Logbytesizemergepolicy (this);//merge policy, that is, which segments are selected for Merging. Mergescheduler Mergescheduler = new Concurrentmergescheduler ();//segment consolidator, behind a thread negative read the full text posted @ 2010-03-06 00:49   Liu Sukei Read First (7807) |  comments (1)   Edit Lucene 3.0 principles and Code Analysis summary: This series of articles will detail the basic Principles and Code analysis of almost the latest version of LUCENE. The overall schema and index file format is Lucene 2.9, and the index process analysis is Lucene 3.0. Since the index file format has not changed much, so the original text is not updated, the principles and architecture of the article refers to some of the predecessors of the diagram, may belong to the early lucene, but does not affect the understanding of the principle and architecture. This series of articles is still in writing, there will be word breaker, segment merging, queryparser, query and query object, search process, scoring formula deduction and other Chapters. In advance to share with you, I hope you criticize correct. Lucene Learning Summary: The basic principle of full-text retrieval Http://www.cnblogs.com/forfuture1978/archive/2009/read the full text posted @ 2010-02-22 20:25  Liu Sukei Read First (7205) |  comments (8)   Edit questions about Lucene (4): Four ways that affect Lucene scoring documents Summary: set document boost and field boost in the index phase, stored in (. Nrm) File. If you want some documents and some fields to be more important than other fields, if this document and the field that contains the word you want to query should score higher, you can set the Boost and field boost values for the document during the index Phase. These values are written to the index file during the index phase, stored in a standardized factor (. Nrm) file, and cannot be changed unless the document is Deleted. If not set, document boost and field boost default to 1. Document boost and Fieldboost are set up in the following ways: document DOC = new Document (); Field f = n... Read the full text posted @ 2010-02-08 23:44  Liu Sukei Read First (3559) |  comments (2)   Edit questions about Lucene (3): vector space model and lucene scoring mechanism summary: issue: As mentioned in your article: so we think of all this document morphemes (term Weight) as a Vector. Document = {term1, term2, ...., term N} document vector = {weight1, weight2, ..., weight N} similarly, We think of query statements as a simple document, as well as Vectors. query = {term1, term 2, ..., term N} Query Vector = {weight1, weight2, ..., weight N} So we put all this document morphemes (term) weights (term weight. . Read the full text posted @ 2010-02-06 13:05  Liu Sukei Read First (3597) |  comments (0)   Edit questions about Lucene (2): stemming and lemmatization summary : Question: I experimented with the stemming and lemmatization mentioned in the article to reduce the word to root form, such as "cars" to "car" and so On. This operation is called: stemming. Turn the word into a root form, such as "drove" to "drive" and so On. This operation is called: Lemmatization. The test did not succeed code as Follows: public class testnorms {public void CreateIndex () throws IOException {Directory d = new Simplefsdirectory (n EW File ("d:/falcontest/lucene3/... Read the full text posted @ 2010-02-06 13:04  Liu Sukei Read First (3701) |  comments (1)   Edit questions about Lucene (1): Why can I search "chinese and republic" but not " Chinese Republic "?": question: using Chinese Academy of Sciences to index the word "people's Republic of china", It is participle of "chinese", "people", "republic", with"people's republic" search, can be searched, and search "republic of china" but not search, with "china and republic" can be found out, why? Answer: I downloaded http://ictclas.org/Download.html Chinese Academy of Sciences to do a simple analysis, if the index when the "people's Republic of china" was divided into the "chinese" "people" "republic", while searching, search "republic of china", then was divided into "chinese Republic ", However build query parser to build the Read full text of query object posted @ 2010-02-06 13:04  Liu Sukei Read First (2155) |  comments (1)   Editing Lucene Learning Summary four: lucene index Process Analysis (4) summary: 6, close IndexWriter object code: writer.close (); --indexwriter.closeinternal (boolean)--(1) writes the index information from memory to Disk: flush (waitformerges, true, true); --(2) segment Merging: Mergescheduler.merge (this); The merging of segments is discussed in a later section, where only the process of writing index information to disk is Discussed. Code: Indexwriter.flush (boolean triggermerge, boolean flushdocstores, Boole ... Read the full text posted @ 2010-02-02 02:02  Liu Sukei Read First (4372) |  comments (5)   Edit Lucene Learning Summary four: lucene index Process Analysis (3) summary: 5, Documentswriter Cache Management of Charblockpool,byteblockpool,intblockpool in the process of indexing, documentswriter the word information (term) stored in charblockpool, the document number (doc ID), Word frequency (freq), and location (prox) information are stored in Byteblockpool. In byteblockpool, the cache is allocated in chunks (slice), the blocks (slice) are hierarchical, the higher the level, the larger the blocks of this layer, and the same block size for each layer. Nextlevelarray indicates that the next layer of the current layer is the first layer, visible on the 9th layer or 9th floor, which means there are up to 9 layers. Le ... Read the full text posted @ 2010-02-02 02:01  Liu Sukei Read First (3903) |  comments (0)   Edit Lucene Learning Summary four: lucene index Process Analysis (2) summary: 3, Add the document to the IndexWriter code: writer.adddocument (doc); -->indexwriter.adddocument (Document doc, Analyzer Analyzer)-->doflush = docwriter.adddocument (doc, analyzer); --documentswriter.updatedocument (Document, Analyzer, Term) note:--> represents a first-level function call IndexWriter then calls documentswriter.adddocument, which calls Docume read the full text posted @ 2010-02-02 01:59  Liu Sukei Read First (7996) |  Comments (1)   Edit Lucene Learning summary: lucene indexing Process Analysis (1) summary: for Lucene's indexing process, In addition to writing the word (term) to the inverted table and eventually to the index file of lucene, It also includes the word breaker (Analyzer) and the merge segment (merge segments), which does not include these two parts, will be analyzed in a future article. Lucene indexing process, A lot of blogs, articles are introduced, recommended that you search an article on the internet: "annotated lucene", as if the Chinese name is called "lucene source analysis" is very good. To really understand the Lucene index file process, The best way is to follow up the code debugging, look at the code in the article, so that not only the most detailed and accurate control of the indexing process (the description is biased, and the code will not deceive you), but also to learn some of the good Lucene implementation, Able to read the full text of the posted @ 2010-02-02 01:58  Liu Sukei First read (14480) |  comments (3)   Edit Lucene Learning Summary three: lucene index file format (3) summary: Iv. Specific Format 4.2. Reverse information Reverse information is the core of the index file, which is the reverse Index. The reverse index consists of two parts, the left is the dictionary (term Dictionary), the right side is the inverted table (Posting list). In lucene, these two parts are stored in the sub-file, the dictionary is stored in the tii,tis, the inverted list is also composed of two parts, part of the document number and word frequency, saved in frq, part of the position information of the words, stored in the Prx. Term Dictionary (tii, tis) –> frequencies (. frq) –> positions (. prx) 4.2.1. Dictionary (tis) and Dictionary index (tii) information in the dictionary, all the words are in dictionary order ... Read the full text posted @ 2010-02-02 01:43  Liu Sukei Read First (10785) |  comments (2)   Edit Lucene Learning Summary three: lucene index file format (2) summary: iv. specific format As explained above, Lucene preserves forward information from index to segment to document to field until term, and also includes reverse information from term to document mapping, as well as some other lucene-specific Information. These three types of information are described BELOW. 4.1. forward information index–> segments (segments.gen, segments_n) –> Field (fnm, fdx, fdt) –> term (tvx, tvd, Tvf) above the hierarchy is not ten Points are accurate, because Segments.gen and segments_n save the metadata information for segment (segment) (met Read full text posted @ 2009-12-14 12:35  Liu Sukei Read First (16825) |   Reviews (3)   Edit Lucene Learning Summary three: Lucene's index file format (1) summary: What is stored in the index of lucene, how to store it, or Lucene's index file format, is a key to read the Lucene source Code. When we really go into the Lucene source code, we will find that: Lucene indexing process, is based on the basic process of full-text retrieval, the inverted table is written in this file format process. Lucene's search process is to read the indexed information in this file format and then calculate the process of scoring each document (score). This article explains in detail the Apache lucene-index File Formats (http://lucene.aPache.org/java/2_9_0/fileformats.html) this Article. first, the basic concept ... Read the full text posted @ 2009-12-14 12:34  Liu Sukei Read first (31365) |  reviews   Edit Lucene Learning Summary two: lucene's overall architecture summary: Lucene is generally: an efficient, extensible, full-text retrieval Library. All implemented in java, without Configuration. Only plain text files are supported for indexing (indexing) and Search. It is not responsible for extracting plain text files from other formats or fetching files from the Network. In the Lucene in action, the framework and process of lucene, for example, shows that Lucene is an index and a search of two processes, including index creation, indexing, searching three Points. Let's look at the various components of lucene in more detail: the document being indexed is represented by the documents Object. IndexWriter the process of creating an index by adding the document to the index through the function Adddocument. Lucene's ... Read the full text posted @ 2009-12-14 12:32  Liu Sukei Read First (18978) |  comments (4)   Edit Lucene Learning summary: The basic principle of full-text retrieval: full-text retrieval is divided into two processes, Index creation (indexing) and search Index. * Index Creation: The process of extracting information from all structured and unstructured data in the real world and creating an Index. * Search Index: is the process of getting the User's query request, searching for the created index, and then returning the Result. So there are three important questions in Full-text Search: 1. What does the index store? (Index) 2. How do I create an index? (indexing) 3. How do I search for an index? (Search) Read the full text

Lucene 4.X Full Course

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.