Lucene Starter Program Preparation Environment
jdk:1.8.0_162
Ide:eclipse neon.3
Database: MySQL 5.7.20
lucene:4.10.4 (already very stable, high version of the partial word breaker support is not good)
Preparing data
SET foreign_key_checks=0;--------------------------------Table structure for ' book '------------------------------- -drop TABLE IF EXISTS ' book '; CREATE TABLE ' book ' (? ' id ' int (one) DEFAULT NULL,? ' BookName ' varchar ($) DEFAULT NULL,? ' Price ' float DEFAULT NULL,? ' pic ' varchar ($) DEFAULT NULL,? ' Bookdesc ' varchar (+) default NULL) Engine=innodb default Charset=utf8;--------------------------------Records of The book--------------------------------INSERT into the book VALUES (' 1 ', ' Java from beginner to proficient ', ' 1.jpg ', ' Java from beginner to proficient ') It is a book published by the People's post and Telecommunications publishing house in 2010, edited by the National 863 Central software incubator. Based on the principle of zero basis for the purpose of explaining the Java technology and practical skills. From a beginner's point of view, this book introduces in detail the techniques that should be mastered in the Java language for program development through easy-to-understand language and colorful examples. The book is divided into 28 chapters, including: initial knowledge of Java, familiar with Eclipse development tools, Java Language Foundation, Process control, strings, arrays, classes and objects, packaging classes, digital processing classes, interfaces, inheritance and polymorphism, advanced features of classes, exception handling, swing program design, collection classes, i/ o Input and output, reflection, enumeration type with generics, multi-threading, network communication, database operations, swing table components, swing tree components, swing other advanced components, Advanced Layout Manager, advanced event handling, AWT drawing and audio playback, printing technology and enterprise invoicing management system, etc. All of the knowledge is introduced with specific examples, the program code involved gives a detailed comment, can make the reader easily understand the essence of Java program development, quickly improve development skills. INSERT into ' BooK ' VALUES (' 2 ', ' Java Web Development ', ' 2.jpg ', ' Java Web ') is the sum of the technologies used in Java to solve the web-related Internet domain. The web includes two parts: Web server and Web client. Java applets are used on the client side, but very rarely, Java is very rich in server applications such as Servlets, JSPs, third-party frameworks, and so on. Java technology has injected a powerful impetus into the development of the web domain. INSERT into ' book ' VALUES (' 3 ', ' Lucene ' from beginner to proficient ', ' ' 3.jpg ', ' ") ', ' a summary of search engine related theories and practical solutions, and a Java implementation, which leverages the popular open source project Lucene and SOLR, but also includes the original implementation. This book mainly includes the general introduction part, the reptile part, the natural language processing part, the full text retrieval part as well as the related case analysis. The crawler section introduces the method of Web page traversal and how to implement incremental crawl, and introduces the method of extracting main content from various documents such as Web pages. The natural language processing is based on the principle of statistical machine learning, including the theory and implementation of Chinese word segmentation and part-of-speech tagging, as well as the practical details in search engines, as well as the document weight, text classification, automatic clustering, parsing tree, The classical problems in the field of natural language processing, such as spell checking, are introduced in brief and the realization method is summarized. In the full Text Search section, the author introduces the principle and progress of search engine with Lucene 3.0. This paper introduces the newest application methods of Lucene with simple examples. This book includes a complete search implementation process, from completion of indexing to search user interface implementations. This book also provides a further introduction to the implementation of quasi-real-time search methods, showing the use of the SOLR 1.4 version and implementing a distributed Search service cluster approach. Finally, the application of GIS in the field of geographic information system and the field of outdoor activity search are introduced. INSERT into ' books ' VALUES (' 4 ', ' Lucene in action ', ' n ', ' 4.jpg '), ' This book introduces lucene--an open-source, full-text search engine development package written in the Java language. It through the superficial Language, a large number of notes, rich code examples, and a clear structure provide the reader with the power of Lucene as a good open source project. A total of 10 chapters, divided into two parts. The 1th part of Lucene's core, focusing on Lucene's core API introduction, and according to the sequence of the integration of Lucene into the program song Organization; part 2nd LuceneApplication, through the introduction of Lucene built-in tools, demonstrates the advanced application of Lucene technology and the porting in various programming languages. INSERT into ' books ' VALUES (' 5 ', ' Lucene Java Essentials Edition ', ' 5.jpg ', ' The book ' summarizes search engine related theories and practical solutions, and gives a Java implementation, which leverages the popular open source project Lucene and SOLR, but also includes the original implementation. This book mainly includes the general introduction part, the reptile part, the natural language processing part, the full text retrieval part as well as the related case analysis. The crawler section introduces the method of Web page traversal and how to implement incremental crawl, and introduces the method of extracting main content from various documents such as Web pages. The natural language processing is based on the principle of statistical machine learning, including the theory and implementation of Chinese word segmentation and part-of-speech tagging, as well as the practical details in search engines, as well as the document weight, text classification, automatic clustering, parsing tree, The classical problems in the field of natural language processing, such as spell checking, are introduced in brief and the realization method is summarized. In the full Text Search section, the author introduces the principle and progress of search engine with Lucene 3.0. This paper introduces the newest application methods of Lucene with simple examples. This book includes a complete search implementation process, from completion of indexing to search user interface implementations. This book also provides a further introduction to the implementation of quasi-real-time search methods, showing the use of the SOLR 1.4 version and implementing a distributed Search service cluster approach. Finally, the application of GIS in the field of geographic information system and the field of outdoor activity search are introduced. ');
Create a project Create MAVEN project (package by selecting Jar)
Configure Pom.xml, import dependencies
<project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "Http://www.w3.org/2001/XMLSchema-instance" xsi: schemalocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > < Modelversion>4.0.0</modelversion> <groupId>com.healchow</groupId> <artifactId> Lucene-first</artifactid> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging > <name>lucene-first</name> <url>http://maven.apache.org</url> <properties> < project.build.sourceencoding>utf-8</project.build.sourceencoding> <!--mysql version--<mysql.ve Rsion>5.1.44</mysql.version> <!--lucene version-<lucene.version>4.10.4</lucene.version& Gt </properties> <dependencies> <!--MySQL Database dependency--<dependency> <groupid >mysql</groupId> <artifactid>mysql-connector-java</artifactid> <version>${mysql.version}</version> </dependency> < ;! --Lucene Dependent Package--<dependency> <groupId>org.apache.lucene</groupId> <ar Tifactid>lucene-core</artifactid> <version>${lucene.version}</version> </dependen cy> <dependency> <groupId>org.apache.lucene</groupId> <artifactid>l Ucene-analyzers-common</artifactid> <version>${lucene.version}</version> </dependenc y> <dependency> <groupId>org.apache.lucene</groupId> <artifactid>lu Cene-queryparser</artifactid> <version>${lucene.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactid>junit</artifactid > <version>4.12</version> <scope>test</scope> </dependency> </dependencies>& Lt;/project>
Prepare the original data collection preparation book Pojo
public class Book { private Integer id; // int(11) DEFAULT NULL, private String bookname; // varchar(500) DEFAULT NULL, private Float price; // float DEFAULT NULL, private String pic; // varchar(200) DEFAULT NULL, private String bookdesc; // varchar(2000) DEFAULT NULL // Getters/Setters @Override public String toString() { return "Book [id=" + id + ", bookname=" + bookname + ", price=" + price + ", pic=" + pic + ", bookdesc=" + bookdesc + "]"; }}
Prepare the book DAO interface
public interface BookDao { /** * 查询全部图书 */ List<Book> queryBookList();}
Implement the book DAO Interface
public class Bookdaoimpl implements Bookdao {/** * query all Books */public list<book> Listall () {// Create a collection of book results List list<book> books = new Arraylist<book> (); Connection conn = null; PreparedStatement prestatement = null; ResultSet ResultSet = null; try {//Load driver Class.forName ("Com.mysql.jdbc.Driver"); Create DATABASE Connection Object conn = Drivermanager.getconnection ("jdbc:mysql://127.0.0.1:3306/lucene?uses Sl=true "," root "," password "); Define query SQL String sql = "SELECT * from book"; Create Statement Statement Object prestatement = conn.preparestatement (sql); Execute the statement, get the result set ResultSet = Prestatement.executequery (); Working with result sets while (Resultset.next ()) {//creating book Objects Books = new (); Book.setid (Resultset.getint ("id")); Book.setbookname (resultset.getstring ("BookName")); Book.setprice (Resultset.getfloat ("price")); Book.setpic (Resultset.getstring ("pic")); Book.setbookdesc (resultset.getstring ("Bookdesc")); Add the results of the query to the list Books.add (book); }} catch (Exception e) {e.printstacktrace (); } finally {//frees resource try {if (null! = conn) conn.close (); if (null! = prestatement) prestatement.close (); if (null! = ResultSet) resultset.close (); } catch (Exception e) {e.printstacktrace (); }} return books; }/** * Main method of test function */public static void main (string[] args) {//create book DAO implementation object Bookdao BOOKD AO = new Bookdaoimpl (); list<book> books = Bookdao.listall (); IfThe result is not empty, then convenient output for (book book:books) {System.out.println (book); } }}
The test results are as follows:
Implementation of the indexing process
- Acquisition of raw data;
- Creating document objects (documents);
- Create a Parser Object (Analyzer) for Word segmentation;
- Create an index configuration object (indexwriterconfig) for configuring Lucene;
- Create an index library directory location Object (directory) that specifies the storage location of the index library;
- Create an index write Object (IndexWriter) to write the document object to the index library;
- Use the IndexWriter object to create an index;
- Frees resources.
Sample code
Import Org.apache.lucene.analysis.analyzer;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.document.document;import Org.apache.lucene.document.field.store;import Org.apache.lucene.document.textfield;import Org.apache.lucene.index.indexwriter;import Org.apache.lucene.index.indexwriterconfig;import Org.apache.lucene.store.directory;import Org.apache.lucene.store.fsdirectory;import Org.apache.lucene.util.version;public class IndexManager {/** * CREATE INDEX function Test * @throws Exception */@Test public void CreateIndex () throws ioexception{//1. Collect Data Bookd AO Bookdao = new Bookdaoimpl (); list<book> books = Bookdao.listall (); 2. Create Document Object List<document> documents = new arraylist<document> (); for (book book:books) {Document document = new document (); Add a domain//Add method to a Document object: Add a field to a Document object, field parameter: domain to add//TextField: Text field, attribute name: domain name, VALue: The value of the field, store: Specifies whether to save the domain value to the document Document.add (new TextField ("BookId", Book.getid () + "", Store.yes)); Document.add (New TextField ("BookName", Book.getbookname (), store.yes)); Document.add (New TextField ("Bookprice", Book.getprice () + "", Store.yes)); Document.add (New TextField ("Bookpic", Book.getpic (), store.yes)); Document.add (New TextField ("Bookdesc", Book.getbookdesc (), store.yes)); Adds a Document object to the collection of document objects Documents.Add (documents); }//3. Create parser Object (Analyzer) for Word Analyzer Analyzer = new StandardAnalyzer (); 4. Create an index configuration object (indexwriterconfig) for configuring Lucene//Parameter one: The Lucene version currently in use, parameter two: parser indexwriterconfig indexconfig = new in Dexwriterconfig (version.lucene_4_10_2, analyzer); 5. Create an index library directory location Object (directory) that specifies the storage location of the index library file path = new file ("/your_path/index"); Directory directory = fsdirectory.open (path); 6. Create an index write Object (IndexWriter) to write the document object to the index IndexWriter IndexWriter = new IndexWriter (directory, indexconfig); 7. Use the IndexWriter object to create an index for (document Doc:documents) {//Adddocement (DOC): Writes a Document object to the index library Indexwri Ter.adddocument (DOC); }//8. Release resources Indexwriter.close (); }}
Test results
Description: Once you see the following file, the index has been created successfully:
View indexes with the Luke tool
Instructions for use
Under Windows OS, double-click the run Start.bat file (provided that you need to configure the environment variables for the JDK);
Under Mac OS, enter the current directory in the terminal, and then type./start.sh to run.
Run Interface One
Run Interface Two
Run Interface Three
The implementation of the retrieval process
- Create a Parser Object (Analyzer) for Word segmentation;
- Creating query objects (queries);
- Create an index library directory location Object (directory) that specifies the location of the index library;
- Create an index read object (Indexreader) for reading the index;
- Create index Search object (indexsearcher) for performing searches;
- Use Indexsearcher object, perform search, return search result set topdocs;
- Processing the result set;
- Frees resources.
Search using the Luke tool
Bookname:lucene--Indicates that the search BookName domain contains Lucene.
Sample code
/** * Search Index function Test * @throws Exception */@Testpublic void Searchindextest () throws Exception {//1. Creating a Parser Object (analyzer), using In the word analyzer Analyzer = new StandardAnalyzer (); 2. Create query Object (query)///2.1 Create a Query resolver object//Parameter one: Default search domain, parameter two: using the parser queryparser queryparser = new Queryparser ("BookName", an Alyzer); 2.2 Using the query parser object, instantiate the query object query query = Queryparser.parse ("Bookname:lucene"); 3. Create index Library directory location object (directory), specify the index library location Directory directory = Fsdirectory.open ("/your_path/index"); 4. Create an index read object (indexreader) to read the index indexreader indexreader = directoryreader.open (directory); 5. Create index Search object (indexsearcher), used to perform index indexsearcher searcher = new Indexsearcher (Indexreader); 6. Perform a search using the Indexsearcher object, return the search result set topdocs//parameter one: Use the query object, parameter two: Specify the first n topdocs topdocs = searcher.search (query,) after sorting the search results to be returned 10); 7. Processing result set//7.1 Print the number of results actually queried System.out.println ("Number of results actually queried:" + topdocs.totalhits); 7.2 Gets the result array of the search//ID of document in Scoredoc and its score scoredoc[] Scoredocs = Topdocs.scoredocs; for (Scoredoc Scoredoc:scoredocs) {System.out.println ("= = = = = = = = = = = = = = = = = = = = ="); Gets the ID and rating of the document int docId = Scoredoc.doc; FLOAT score = Scoredoc.score; System.out.println ("Document Id=" + DocId + ", score =" + score); Querying document data based on document ID-equivalent to querying data in a relational database based on primary key ID doc = Searcher.doc (docId); System.out.println ("Book ID:" + doc.get ("bookId")); System.out.println ("Book Name:" + doc.get ("BookName")); System.out.println ("Book Price:" + doc.get ("Bookprice")); System.out.println ("Book Picture:" + doc.get ("Bookpic")); System.out.println ("Book Description:" + doc.get ("Bookdesc")); }//8. Close resource Indexreader.close ();}
Test results
Result description
- The index library contains index fields and document fields;
- Index field holds index data (inverted index) for indexing;
- The document field holds the document data for searching for data.
Indexsearcher method
Method |
Description |
Indexsearcher.search (query, N) |
Returns the highest rated N records according to query search |
Indexsearcher.search (Query,filter,n) |
Based on query search, add a filter policy to return the highest rated N records |
Indexsearcher.search (query, N, sort) |
Based on query search, add a sort policy to return the highest rated N records |
Indexsearcher.search (booleanquery, filter, N, sort) |
Based on query search, add a filtering policy, add a sorting strategy, and return the highest rated N records |
Copyright notice
Author: Ma_shoufeng (Ma Ching)
Source: Blog Park Ma Ching's Blog
Your support is a great encouragement to bloggers, thank you for your reading.
The copyright of this article is owned by bloggers, welcome reprint, but without the blogger agreed to retain this paragraph statement, and in the article page obvious location to the original link, otherwise Bo Master reserves the right to pursue legal responsibility.
Lucene Starter Program-java API easy to use