Lucene in Action Learning notes (i)

Source: Internet
Author: User
1.1 How to deal with the era of explosionInformation Retrieval Technology
What is 1.2 lucene? What is 1.2.1 Lucene?Lucene is a high-performance, extensible library of information retrieval tools. Information retrieval refers to document search, in-Document information search, document-related
Meta data manipulation.
Information retrieval (Information retrieval)
With Lucene, you'll find that it provides you with a set of simple and powerful core api#.
1.2.2 What Lucene can doLucene is just a software class library, or a toolbox. and is not a complete search program. Lucene focuses on # text index # and # search # features, and works very well.
1.3 components in Lucene 1.3.1 Index Component(1) Get content:Lucene, as a core search repository, does not provide any functionality for content acquisition
Open source Crawler tools: Nutch,solr,grub
(2) Create a document: Converts the contents of the download crawl to the form of a document.
(3) Document Analysis: Splits the text into a series of independent atomic elements called a lexical unit.
(4) Document Index: The document is added to the index list.
1.3.2 Search ComponentsThe process of search processing is find a WORD from the indexThus Locate the document that contains the word。 The search quality is mainly determined by the precision ratio (precision) and
Recall rate (recall) to measure. The benchmark module is used for inspection.
(1) User Search Interface: Simple, beautiful, results presentation list.
(2) Build Query criteria (build queries): Lucene provides a powerful bundle called Query Parser (Queryparser) that can be used in accordance with the common query syntax to put the user
The input text is processed into a query object. When you encapsulate a query object: You can do some special processing according to your own syntax. E-commerce sites, for example, would be weighted for goods with high interest rates.
(3) Searching queries (search query)
The query retrieves the index and returns the document that matches the query statement, and the returned results are sorted by query request.
(4) Show Results
Show results for users in an intuitive, easy-to-use way.
1.3.3 Other ComponentsManagement interface, analysis interface, search scope.
1.3.4 Lucene integration with other programsAdd Lucene to your project
1.4 Lucene Combat: program Example 1.4.1 Index Demo1 1.4.2 Search Demo2 1.5 Lucene Index and Search core class introduction Indexwriter,directory,analyzer,document,field
1.5.1 IndexWriterIndexWriter (write index) is the core component of the index, which is responsible for # Creating an index or modifying, deleting an existing index #. (Provides an indexed write operation,
Do not provide read and search operations)
1.5.2 DirectoryDepending on the storage path of the index, #读取索引文件 #, and finally passed to the IndexWriter class (by constructing the method).
1.5.3 AnalyzerText files need to be processed by the analyzer before they are indexed. Analyzer is specified in the construct by IndexWriter and is responsible for extracting from the text file
Vocabulary unit. (If the content is not a plain text file, you need to convert it to a text document first.)
1.5.4 DocumentThe Document object represents a collection of some fields (field). The document object can be understood as a virtual document-such as a Web page or a text file-and then you can retrieve a large amount of data from it.
(The Document object has a simple structure and is a container containing multiple field objects;
field refers to the inclusion of text content that can be indexed.
1.5.5 FieldEach document in the index contains one or more different named fields, and the document may have
More than one domain with the same name.
1.6 Understanding the core classes of the search processIndexsercher,term,query,termquery,topdocs
1.6.1 IndexsearcherThis class exposes several search methods, which are the central part of the connection index. Indexsearcher cannot directly open the index file, it indirectly through the directory
The instance reads the index file.
The simplest method is to pass in the query object and the number of queries, and finally return a Topdocs object.
Directory Dir=fsdirectory.open (New File ("/temp/index"));
Indexsearcher searcher=new Indexsearcher (dir);
Query q = new Termquery (New term ("contents", "Lucene"));
Topdocs hits = Searcher.search (q,10);
Searcher.close ();
1.6.2 TermThe term object is the basic unit of the search function.
Query q = new Termquery (New term ("contents", "Lucene"));
Topdocs hits = Searcher.search (1,10);
The above code indicates that Lucene is looking for the first 10 documents containing the word Lucene in the contents domain.
1.6.3 QueryLucene contains a number of specific query (query) subclasses. So far, we have only touched the most basic subclass of Lucene query: the Termquery class.
Other query subclasses are: Booleanquery,prefixquery and so on.
1.6.4 TermqueryThe simplest and most basic query item. Can be queried to match out the specified field that contains the specified document.
1.6.5 TopdocsThe Topdocs class is a simple pointer container. A pointer generally refers to a search result (a document on a match) that is ranked forward N.
Note:
Lucene is a library of information retrieval tools, not a software product.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.