Lucene getting started

Last Update:2015-08-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Lucene Introduction

Lucene is a fully functional full-text search engine library developed in java. It is suitable for almost any application that requires full-text search, especially cross-platform applications. Lucene is an open-source free project. Lucene is simple to use but provides powerful functions. The features are as follows:

Hardware speed exceeds 150 GB/hour
Smaller memory requirements, only 1 MB of heap space required
Quickly add indexes and batch Indexes
The index size is greater than 20%-30% of the indexed text.

Lucene: http://lucene.apache.org/

The text sample project is built using maven, And the Lucene version is 5.2.1. The dependent files are as follows:

<Project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"
Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<ModelVersion> 4.0.0 </modelVersion>
<GroupId> com. shh </groupId>
<ArtifactId> lucene </artifactId>
<Packaging> war </packaging>
<Version> 0.0.1-SNAPSHOT </version>
<Name> lucene Maven Webapp </name>
<Url> http://maven.apache.org </url>
<Properties>
<Project. build. sourceEncoding> UTF-8 </project. build. sourceEncoding>
<Lucene. version> 5.2.1 </lucene. version>
</Properties>

<Dependencies>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-core </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>

<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-queryparser </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-analyzers-common </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>

<! -- Word Divider -->
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-analyzers-smartcn </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>

<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-highlighter </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
</Dependencies>

<Build>
<FinalName> lucene </finalName>
</Build>
</Project>

Ii. Example

1. Create an index

The related code is as follows:

Package com. test. lucene;

Import java. io. IOException;
Import java. nio. file. Paths;

Import org. apache. lucene. analysis. Analyzer;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. Field. Store;
Import org.apache.e.doc ument. IntField;
Import org.apache.e.doc ument. StringField;
Import org.apache.e.doc ument. TextField;
Import org. apache. lucene. index. IndexWriter;
Import org. apache. lucene. index. IndexWriterConfig;
Import org. apache. lucene. index. IndexWriterConfig. OpenMode;
Import org. apache. lucene. store. Directory;
Import org. apache. lucene. store. FSDirectory;

/**
* Create an index
*/
Public class IndexCreate {

Public static void main (String [] args ){
// Specify the Word Segmentation technology. The standard word segmentation is used here.
Analyzer analyzer = new StandardAnalyzer ();

// IndexWriter configuration information
IndexWriterConfig indexWriterConfig = new IndexWriterConfig (analyzer );

// Index opening method: If no index is displayed, it is created. If yes, it is opened.
IndexWriterConfig. setOpenMode (OpenMode. CREATE_OR_APPEND );

Directory directory = null;
IndexWriter indexWriter = null;
Try {
// Storage path of the index on the hard disk
Directory = FSDirectory. open (Paths. get ("D: // index/test "));
// IndexWriter is used to create an index file.
IndexWriter = new IndexWriter (directory, indexWriterConfig );
} Catch (IOException e ){
E. printStackTrace ();
}

// Create document 1
Document doc1 = new Document ();
Doc1.add (new StringField ("id", "abcde", Store. YES ));
Doc1.add (new TextField ("content", "Guangzhou, China", Store. YES ));
Doc1.add (new IntField ("num", 1, Store. YES ));

// Create document 2
Document doc2 = new Document ();
Doc2.add (new StringField ("id", "asdff", Store. YES ));
Doc2.add (new TextField ("content", "Shanghai, China", Store. YES ));
Doc2.add (new IntField ("num", 2, Store. YES ));

Try {
// Add the document to be indexed
IndexWriter. addDocument (doc1 );
IndexWriter. addDocument (doc2 );

// Submit the indexWrite operation. If you do not submit the operation, the previous operation will not be saved to the hard disk.
// However, this step consumes system resources and requires certain policies to perform this operation on indexes.
IndexWriter. commit ();
} Catch (IOException e ){
E. printStackTrace ();
} Finally {
// Close the resource
Try {
IndexWriter. close ();
Directory. close ();
} Catch (IOException e ){
E. printStackTrace ();
}
}
}
}

2. Search

The related code is as follows:

Package com. test. lucene;

Import java. io. IOException;
Import java. nio. file. Paths;

Import org. apache. lucene. analysis. Analyzer;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org. apache. lucene. index. DirectoryReader;
Import org. apache. lucene. queryparser. classic. ParseException;
Import org. apache. lucene. queryparser. classic. QueryParser;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;
Import org. apache. lucene. search. TopDocs;
Import org. apache. lucene. store. Directory;
Import org. apache. lucene. store. FSDirectory;

/**
* Search
*/
Public class IndexSearch {

Public static void main (String [] args ){
// Index storage location
Directory directory = null;
Try {
// Index the hard disk storage path
Directory = FSDirectory. open (Paths. get ("D: // index/test "));
// Read the index
DirectoryReader directoryReader = DirectoryReader. open (directory );
// Create an index search object
IndexSearcher searcher = new IndexSearcher (directoryReader );
// Word Segmentation technology
Analyzer analyzer = new StandardAnalyzer ();
// Create a Query
QueryParser parser = new QueryParser ("content", analyzer );
Query query = parser. parse ("Guangzhou"); // Query
// Retrieve the index to obtain the first 10 matching records
TopDocs topDocs = searcher. search (query, 10 );
If (topDocs! = Null ){
System. out. println ("the matching record is:" + topDocs. totalHits );
For (int I = 0; I <topDocs. scoreDocs. length; I ++ ){
Document doc = searcher.doc(topdocs.scoredocs? I =.doc );
System. out. println ("id =" + doc. get ("id "));
System. out. println ("content =" + doc. get ("content "));
System. out. println ("num =" + doc. get ("num "));
}
}
Directory. close ();
DirectoryReader. close ();
} Catch (IOException e ){
E. printStackTrace ();
} Catch (ParseException e ){
E. printStackTrace ();
}
}
}

The running result is as follows:

Iii. How Lucene works

Lucene full-text search involves two steps:

Index creation: extracts information from data (including database data and files) and creates index files.

Search index: searches for the created index based on the user's search request and returns the search result to the user.

It's not fun yet. Let's take a look at more related content about Lucene under the split line:

-------------------------------------- Split line --------------------------------------

Indexing and searching based on Lucene multi-index

Lucene (version 2nd) Chinese edition supporting source code

Lucene (version 2nd) PDF

Use Lucene-Spatial to implement full-text retrieval of integrated geographical locations

Lucene + Hadoop distributed search runtime framework Nut 1.0a9

Lucene + Hadoop distributed search runtime framework Nut 1.0a8

Lucene + Hadoop distributed search runtime framework Nut 1.0a7

Project 2-1: Configure Lucene and create a WEB query system [Ubuntu 10.10]

-------------------------------------- Split line --------------------------------------

Lucene details: click here
Lucene: click here

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene getting started

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene getting started

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support