Lucene getting started
1. Lucene Introduction
Lucene is a fully functional full-text search engine library developed in java. It is suitable for almost any application that requires full-text search, especially cross-platform applications. Lucene is an open-source free project. Lucene is simple to use but provides powerful functions. The features are as follows:
- Hardware speed exceeds 150 GB/hour
- Smaller memory requirements, only 1 MB of heap space required
- Quickly add indexes and batch Indexes
- The index size is greater than 20%-30% of the indexed text.
Lucene: http://lucene.apache.org/
The text sample project is built using maven, And the Lucene version is 5.2.1. The dependent files are as follows:
<Project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"
Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<ModelVersion> 4.0.0 </modelVersion>
<GroupId> com. shh </groupId>
<ArtifactId> lucene </artifactId>
<Packaging> war </packaging>
<Version> 0.0.1-SNAPSHOT </version>
<Name> lucene Maven Webapp </name>
<Url> http://maven.apache.org </url>
<Properties>
<Project. build. sourceEncoding> UTF-8 </project. build. sourceEncoding>
<Lucene. version> 5.2.1 </lucene. version>
</Properties>
<Dependencies>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-core </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-queryparser </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-analyzers-common </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
<! -- Word Divider -->
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-analyzers-smartcn </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
<Dependency>
<GroupId> org. apache. lucene </groupId>
<ArtifactId> lucene-highlighter </artifactId>
<Version >$ {lucene. version} </version>
</Dependency>
</Dependencies>
<Build>
<FinalName> lucene </finalName>
</Build>
</Project>
Ii. Example
1. Create an index
The related code is as follows:
Package com. test. lucene;
Import java. io. IOException;
Import java. nio. file. Paths;
Import org. apache. lucene. analysis. Analyzer;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. Field. Store;
Import org.apache.e.doc ument. IntField;
Import org.apache.e.doc ument. StringField;
Import org.apache.e.doc ument. TextField;
Import org. apache. lucene. index. IndexWriter;
Import org. apache. lucene. index. IndexWriterConfig;
Import org. apache. lucene. index. IndexWriterConfig. OpenMode;
Import org. apache. lucene. store. Directory;
Import org. apache. lucene. store. FSDirectory;
/**
* Create an index
*/
Public class IndexCreate {
Public static void main (String [] args ){
// Specify the Word Segmentation technology. The standard word segmentation is used here.
Analyzer analyzer = new StandardAnalyzer ();
// IndexWriter configuration information
IndexWriterConfig indexWriterConfig = new IndexWriterConfig (analyzer );
// Index opening method: If no index is displayed, it is created. If yes, it is opened.
IndexWriterConfig. setOpenMode (OpenMode. CREATE_OR_APPEND );
Directory directory = null;
IndexWriter indexWriter = null;
Try {
// Storage path of the index on the hard disk
Directory = FSDirectory. open (Paths. get ("D: // index/test "));
// IndexWriter is used to create an index file.
IndexWriter = new IndexWriter (directory, indexWriterConfig );
} Catch (IOException e ){
E. printStackTrace ();
}
// Create document 1
Document doc1 = new Document ();
Doc1.add (new StringField ("id", "abcde", Store. YES ));
Doc1.add (new TextField ("content", "Guangzhou, China", Store. YES ));
Doc1.add (new IntField ("num", 1, Store. YES ));
// Create document 2
Document doc2 = new Document ();
Doc2.add (new StringField ("id", "asdff", Store. YES ));
Doc2.add (new TextField ("content", "Shanghai, China", Store. YES ));
Doc2.add (new IntField ("num", 2, Store. YES ));
Try {
// Add the document to be indexed
IndexWriter. addDocument (doc1 );
IndexWriter. addDocument (doc2 );
// Submit the indexWrite operation. If you do not submit the operation, the previous operation will not be saved to the hard disk.
// However, this step consumes system resources and requires certain policies to perform this operation on indexes.
IndexWriter. commit ();
} Catch (IOException e ){
E. printStackTrace ();
} Finally {
// Close the resource
Try {
IndexWriter. close ();
Directory. close ();
} Catch (IOException e ){
E. printStackTrace ();
}
}
}
}
2. Search
The related code is as follows:
Package com. test. lucene;
Import java. io. IOException;
Import java. nio. file. Paths;
Import org. apache. lucene. analysis. Analyzer;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org. apache. lucene. index. DirectoryReader;
Import org. apache. lucene. queryparser. classic. ParseException;
Import org. apache. lucene. queryparser. classic. QueryParser;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;
Import org. apache. lucene. search. TopDocs;
Import org. apache. lucene. store. Directory;
Import org. apache. lucene. store. FSDirectory;
/**
* Search
*/
Public class IndexSearch {
Public static void main (String [] args ){
// Index storage location
Directory directory = null;
Try {
// Index the hard disk storage path
Directory = FSDirectory. open (Paths. get ("D: // index/test "));
// Read the index
DirectoryReader directoryReader = DirectoryReader. open (directory );
// Create an index search object
IndexSearcher searcher = new IndexSearcher (directoryReader );
// Word Segmentation technology
Analyzer analyzer = new StandardAnalyzer ();
// Create a Query
QueryParser parser = new QueryParser ("content", analyzer );
Query query = parser. parse ("Guangzhou"); // Query
// Retrieve the index to obtain the first 10 matching records
TopDocs topDocs = searcher. search (query, 10 );
If (topDocs! = Null ){
System. out. println ("the matching record is:" + topDocs. totalHits );
For (int I = 0; I <topDocs. scoreDocs. length; I ++ ){
Document doc = searcher.doc(topdocs.scoredocs? I =.doc );
System. out. println ("id =" + doc. get ("id "));
System. out. println ("content =" + doc. get ("content "));
System. out. println ("num =" + doc. get ("num "));
}
}
Directory. close ();
DirectoryReader. close ();
} Catch (IOException e ){
E. printStackTrace ();
} Catch (ParseException e ){
E. printStackTrace ();
}
}
}
The running result is as follows:
Iii. How Lucene works
Lucene full-text search involves two steps:
Index creation: extracts information from data (including database data and files) and creates index files.
Search index: searches for the created index based on the user's search request and returns the search result to the user.
Related:
It's not fun yet. Let's take a look at more related content about Lucene under the split line:
-------------------------------------- Split line --------------------------------------
Indexing and searching based on Lucene multi-index
Lucene (version 2nd) Chinese edition supporting source code
Lucene (version 2nd) PDF
Use Lucene-Spatial to implement full-text retrieval of integrated geographical locations
Lucene + Hadoop distributed search runtime framework Nut 1.0a9
Lucene + Hadoop distributed search runtime framework Nut 1.0a8
Lucene + Hadoop distributed search runtime framework Nut 1.0a7
Project 2-1: Configure Lucene and create a WEB query system [Ubuntu 10.10]
-------------------------------------- Split line --------------------------------------
Lucene details: click here
Lucene: click here
This article permanently updates the link address: