Lucene Learning Record (i)--lucene demo learning

Source: Internet
Author: User

To the great practice of the truth!

The full-text search was previously studied, but the focus was on use, and the focus was on Lucene-based tools Zoie, and there was no time for a good look at what really happened. So now spare time to take a good look at the official website, research the Lucene this full-text search root. Due to the limited level, many places are more obvious and may have errors, please crossing haihan, pls correct me!

This article directly skip the various introduction of Lucene, cited and so on, directly from the demo of Lucene to start recording.

The Lucene version I am using is 4.10.2. : Download, because I use the Windows environment, so directly downloaded the zip package, after extracting, the directory structure is as follows:

analysis--word breaker benchmark--Standard, Guideline classification--Classification codecs--codec Core--lucene core JAR Package demo--example (including War package) docs--commentary document and API document expressions--Expression facet--lucene statistical query package Group ---Combination query Package highlighter--Highlight Package join--index and query simultaneously carry out package memory--main Memory misc--index tool and some other miscellaneous code queries--filtering and querying queryparser--query parsing and parsing framework replicator-- Copy index sandbox--contributions from multiple third-party partners and new ideas spatial--spatial query suggest--automatic suggestion and spell check test-framework--test framework
This article studies the demo in the Demo folder. Lucene's Online document address: Click to open the link, the demo's use of the link in the Docement home page of the Getting Started section of the first link: Click to open the link. Here are the steps to proceed:

1. Create a project in Eclipse, the name of a pickup. Unlike Lucene's demo guide, its guidance is a non-IDE environment.

2. In the guidance page, place the dependent jar package in the classpath:

Setting your classpathfirst, you should download the latest Lucene distribution and then extract it to a working directory . You need four jars:the Lucene jars, the Queryparser jar, the common analysis jar, and the Lucene demo jar.  You should see the Lucene JAR file in the core/directory if you created extracted the archive--it should is named Something like Lucene-core-{version}.jar. You should also see files called Lucene-queryparser-{version}.jar, Lucene-analyzers-common-{version}.jar and Lucene-demo-{version}.jar under Queryparser, Analysis/common/and demo/, respectively. Put all four of the these files in your Java CLASSPATH.
In fact, under Eclipse is to introduce the dependent jar package into the BuildPath of the project.

3. File index, that is, create an index file. The test Class I created is Demoindexwriter.java:

Package Lucene;import Org.apache.lucene.demo.indexfiles;public class Demoindexwriter {public static void main (string[] args) {string[] arg0 = new string[]{"-docs", "F:/worktestspace/lucenedemo/src"};indexfiles.main (arg0);}}
In this class, call the main method of the Indexfiles in the Lucene-demo-4.10.2.jar package. Note that the value of-docs is a folder that has files (that is, it cannot be an empty folder, otherwise there are no files that can be indexed), and after execution, the console output is as follows


I was prompted to know that my two test class files have been indexed. And the index file is placed in the project directory under the index directory, the project is refreshed, the index folder is displayed. -


Since I created the index file, this time it was recreated, the original _0* files were deleted. These documents each represent what meaning, later article again detailed.

4. Create a search test class demoindexreader->

Package Lucene;import Org.apache.lucene.demo.searchfiles;public class Demoindexreader {public static void main (string[ ] args) throws Exception {Searchfiles.main (args);}}
The main method of Searchfiles in the demo package is called, without passing any parameters. And then in the console for some input operations,

Query a string of unordered random strings, the result is 0 results; query ' string ' because two files have a string, so query to 2 results, and paging.


In this case, the simple use of Lucene demo runs out, you can see the results of the search very intuitively. In the overview page of the Lucene demo, the Indexfiles.java and Searchfiles.java two classes in the demo package are analyzed. Click to open the link. Here is also a simple point. Indexfiles.java Source:indexfiles.java->

Usage is a variable of a cue language;

The-index parameter points to the directory that was placed after the index was created

The-docs parameter specifies the directory to be indexed

The-update parameter specifies whether to add a new doc in the original index file

To configure the IndexWriter code:

 Analyzer analyzer = new StandardAnalyzer (VERSION.LUCENE_4_10_0);      Indexwriterconfig IWC = new Indexwriterconfig (VERSION.LUCENE_4_10_0, analyzer); if (create) {//Create a new index in the directory, removing any//previously indexed documents:i      Wc.setopenmode (openmode.create);      } else {//ADD new documents to an existing Index:iwc.setOpenMode (openmode.create_or_append); }//Optional:for better indexing performance, if you//is indexing many documents, increase the RAM//  Buffer. Increase the Max heap/size to the JVM (eg add-xmx512m or-xmx1g):////Iwc.setramb      UFFERSIZEMB (256.0); IndexWriter writer = new IndexWriter (dir, IWC); 
By code, the simplest index step requires a word breaker (Analyzer) and a Write Index object (indexwriter), and IndexWriter needs to configure the Indexwriterconfig instance, Indexwriterconfig In addition to the configuration of the source of the word breaker and operation mode can also configure a lot of other properties, can be applied to a variety of scenarios, later specific analysis. After creating the IndexWriter is the actual creation of the index file operation, this is the real Indexdocs (WRITER,DOCDIR), implemented in the method,

Document doc = new document ();  Add the path of the file as a field named "Path". Use a//field that's indexed (i.e searchable), but don ' t tokenize//The field into separate words a nd don ' t index term frequency//or positional information:field Pathfield = new Stringfield ("path", fi          Le.getpath (), Field.Store.YES);          Doc.add (Pathfield);          ADD the last modified date of the file a field named "Modified".  Use a Longfield this is indexed (i.e. efficiently filterable with//Numericrangefilter).  This indexes to Milli-second resolution, which//was often too fine. You could instead create a number based on//year/month/day/hour/minutes/seconds and down the resolution you Requir          E.//For example the Long value 2011021714 would mean//February, +, 2-3 PM.          Doc.add (New Longfield ("Modified", file.lastmodified (), Field.Store.NO)); Add the ContenTS of the file to a field named "Contents".          Specify a Reader,//So, the text of the file was tokenized and indexed, but not stored.          Note that FileReader expects the file to is in UTF-8 encoding.          If that's not the case searching for special characters would fail.          Doc.add (New TextField ("Contents", New BufferedReader (New InputStreamReader (FIS, standardcharsets.utf_8))); if (Writer.getconfig (). Getopenmode () = = Openmode.create) {//New index, so we just add the document (no            Cument can be There): System.out.println ("adding" + file);          Writer.adddocument (DOC); } else {//Existing index (an old copy of the This document is been indexed) so//we use Update Document instead to replace the old one matching the exact/path, if PRESENT:SYSTEM.OUT.PRINTLN (            "Updating" + file); Writer.updatedocument (New term ("path", File.getpath ()), doc); }

Documents represent a document; field is part of the document (whether the settings are stored, indexed, segmented, etc.), Originally I used 2.9.1 version, only one field, the properties need to set their own, 4.10.2 version of the package a lot of specific field, more look, choose the right use, more convenient. Then indexwriter the Adddocument method is to add a new document, UpdateDocument method is to update the index document (if present, delete and add, if not present).

Searchfiles.java class Source: Click here to view

Indexreader reader = Directoryreader.open (Fsdirectory.open (new File));    Indexsearcher searcher = new Indexsearcher (reader);    :P Ost-release-update-version.lucene_xy:    Analyzer Analyzer = new StandardAnalyzer (VERSION.LUCENE_4_10_0);    BufferedReader in = null;    if (queries! = null) {      in = new BufferedReader (new InputStreamReader (new FileInputStream (queries), Standardcharsets.utf_8));    } else {      in = new BufferedReader (new InputStreamReader (system.in, Standardcharsets.utf_8));    }    :P ost-release-update-version.lucene_xy:    queryparser parser = new Queryparser (Version.lucene_4_10_0, field, Analyzer);
The key part of the retrieval process is indexreader, the reader gets the Indexsearcher (finder), the word breaker (need and index with the same word breaker, to ensure the correctness of the query results), Queryparser parser, The query body is then obtained by the parser.

Topdocs results = searcher.search (query, 5 * hitsperpage);    Scoredoc[] hits = Results.scoredocs;
Document doc = Searcher.doc (hits[i].doc);
The obtained document can parse out the contents of the query.

Temporarily first these ~ ~ Very tired ~ ~
To be Continued ~ ~




Lucene Learning Record (i)--lucene demo learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.