Lucene getting started

Source: Internet
Author: User

Lucene getting started

1. Lucene Introduction

Lucene is a fully functional full-text search engine library developed in java. It is suitable for almost any application that requires full-text search, especially cross-platform applications. Lucene is an open-source free project. Lucene is simple to use but provides powerful functions. The features are as follows:

  • Hardware speed exceeds 150 GB/hour
  • Smaller memory requirements, only 1 MB of heap space required
  • Quickly add indexes and batch Indexes
  • The index size is greater than 20%-30% of the indexed text.

Lucene: http://lucene.apache.org/

The text sample project is built using maven, And the Lucene version is 5.2.1. The dependent files are as follows:

<Project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"
Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<ModelVersion> 4.0.0 </modelVersion>
<GroupId> com. shh </groupId>
<ArtifactId> lucene </artifactId>
<Packaging> war </packaging>
<Version> 0.0.1-SNAPSHOT </version>
<Name> lucene Maven Webapp </name>
<Url> http://maven.apache.org </url>
<Properties>
<Project. build. sourceEncoding> UTF-8 </project. build. sourceEncoding>
<Lucene. version> 5.2.1 </lucene. version>
</Properties>

<Dependencies>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-core </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>

<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-queryparser </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-analyzers-common </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>

<! -- Word Divider -->
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-analyzers-smartcn </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>

<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-highlighter </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
</Dependencies>

<Build>
<FinalName> lucene </finalName>
</Build>
</Project>

Ii. Example

1. Create an index

The related code is as follows:

Package com. test. lucene;

Import java. io. IOException;
Import java. nio. file. Paths;

Import org. apache. lucene. analysis. Analyzer;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. Field. Store;
Import org.apache.e.doc ument. IntField;
Import org.apache.e.doc ument. StringField;
Import org.apache.e.doc ument. TextField;
Import org. apache. lucene. index. IndexWriter;
Import org. apache. lucene. index. IndexWriterConfig;
Import org. apache. lucene. index. IndexWriterConfig. OpenMode;
Import org. apache. lucene. store. Directory;
Import org. apache. lucene. store. FSDirectory;

/**
* Create an index
*/
Public class IndexCreate {

Public static void main (String [] args ){
// Specify the Word Segmentation technology. The standard word segmentation is used here.
Analyzer analyzer = new StandardAnalyzer ();

// IndexWriter configuration information
IndexWriterConfig indexWriterConfig = new IndexWriterConfig (analyzer );

// Index opening method: If no index is displayed, it is created. If yes, it is opened.
IndexWriterConfig. setOpenMode (OpenMode. CREATE_OR_APPEND );

Directory directory = null;
IndexWriter indexWriter = null;
Try {
// Storage path of the index on the hard disk
Directory = FSDirectory. open (Paths. get ("D: // index/test "));
// IndexWriter is used to create an index file.
IndexWriter = new IndexWriter (directory, indexWriterConfig );
} Catch (IOException e ){
E. printStackTrace ();
}

// Create document 1
Document doc1 = new Document ();
Doc1.add (new StringField ("id", "abcde", Store. YES ));
Doc1.add (new TextField ("content", "Guangzhou, China", Store. YES ));
Doc1.add (new IntField ("num", 1, Store. YES ));

// Create document 2
Document doc2 = new Document ();
Doc2.add (new StringField ("id", "asdff", Store. YES ));
Doc2.add (new TextField ("content", "Shanghai, China", Store. YES ));
Doc2.add (new IntField ("num", 2, Store. YES ));

Try {
// Add the document to be indexed
IndexWriter. addDocument (doc1 );
IndexWriter. addDocument (doc2 );
 
// Submit the indexWrite operation. If you do not submit the operation, the previous operation will not be saved to the hard disk.
// However, this step consumes system resources and requires certain policies to perform this operation on indexes.
IndexWriter. commit ();
} Catch (IOException e ){
E. printStackTrace ();
} Finally {
// Close the resource
Try {
IndexWriter. close ();
Directory. close ();
} Catch (IOException e ){
E. printStackTrace ();
}
}
}
}

2. Search

The related code is as follows:

Package com. test. lucene;

Import java. io. IOException;
Import java. nio. file. Paths;

Import org. apache. lucene. analysis. Analyzer;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org. apache. lucene. index. DirectoryReader;
Import org. apache. lucene. queryparser. classic. ParseException;
Import org. apache. lucene. queryparser. classic. QueryParser;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;
Import org. apache. lucene. search. TopDocs;
Import org. apache. lucene. store. Directory;
Import org. apache. lucene. store. FSDirectory;

/**
* Search
*/
Public class IndexSearch {

Public static void main (String [] args ){
// Index storage location
Directory directory = null;
Try {
// Index the hard disk storage path
Directory = FSDirectory. open (Paths. get ("D: // index/test "));
// Read the index
DirectoryReader directoryReader = DirectoryReader. open (directory );
// Create an index search object
IndexSearcher searcher = new IndexSearcher (directoryReader );
// Word Segmentation technology
Analyzer analyzer = new StandardAnalyzer ();
// Create a Query
QueryParser parser = new QueryParser ("content", analyzer );
Query query = parser. parse ("Guangzhou"); // Query
// Retrieve the index to obtain the first 10 matching records
TopDocs topDocs = searcher. search (query, 10 );
If (topDocs! = Null ){
System. out. println ("the matching record is:" + topDocs. totalHits );
For (int I = 0; I <topDocs. scoreDocs. length; I ++ ){
Document doc = searcher.doc(topdocs.scoredocs? I =.doc );
System. out. println ("id =" + doc. get ("id "));
System. out. println ("content =" + doc. get ("content "));
System. out. println ("num =" + doc. get ("num "));
}
}
Directory. close ();
DirectoryReader. close ();
} Catch (IOException e ){
E. printStackTrace ();
} Catch (ParseException e ){
E. printStackTrace ();
}
}
}

The running result is as follows:

Iii. How Lucene works

Lucene full-text search involves two steps:

Index creation: extracts information from data (including database data and files) and creates index files.

Search index: searches for the created index based on the user's search request and returns the search result to the user.

Related:

It's not fun yet. Let's take a look at more related content about Lucene under the split line:

-------------------------------------- Split line --------------------------------------

Indexing and searching based on Lucene multi-index

Lucene (version 2nd) Chinese edition supporting source code

Lucene (version 2nd) PDF

Use Lucene-Spatial to implement full-text retrieval of integrated geographical locations

Lucene + Hadoop distributed search runtime framework Nut 1.0a9

Lucene + Hadoop distributed search runtime framework Nut 1.0a8

Lucene + Hadoop distributed search runtime framework Nut 1.0a7

Project 2-1: Configure Lucene and create a WEB query system [Ubuntu 10.10]

-------------------------------------- Split line --------------------------------------

Lucene details: click here
Lucene: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.