Introduction to Luncene and Solr

Last Update:2015-09-20 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to Luncene and Solr

As the search engine function has a large number of functional requirements that require search engines in the portal community to improve user experience, there are currently a centralized solution for implementing search engines:

Implement intra-site search by using Lucene's own encapsulation. Large workload and scalability, not used.
Call the APIs of Google and Baidu to implement intra-site search. It is too dead to bind with a third-party search engine to meet the business expansion needs in the future.
Implement intra-site search based on Compass + Lucene. It is suitable for indexing database-driven application data, especially replacing the traditional like '% expression %' to index fields such as varchar and clob, it is a worthwhile solution to implement intra-site search. However, you still need to encapsulate distributed processing and interface encapsulation to a certain extent.
Implement intra-site search based on Solr. This solution provides complete solutions for better encapsulation and scalability. Therefore, this solution is used in the portal community and later added to the Compass solution.

Lucene Introduction

Lucene is a Java-based full-text information retrieval toolkit. It is not a complete search application, but provides indexing and search functions for your application. Lucene is currently an open-source project in the Apache Jakarta family. It is also the most popular Java-based open-source full-text retrieval toolkit.

Currently, many application search functions are based on Lucene, such as the search function of the Eclipse help system. Lucene can index text data. Therefore, Lucene can index and search your documents as long as you can convert the data format you want to index into text. For example, if you want to index some HTML and PDF documents, you must first convert the HTML and PDF documents into text formats, and then hand the converted content to Lucene for indexing, then, save the created index file to the disk or memory, and query the index file based on the query conditions entered by the user. Without specifying the format of the document to be indexed, Lucene can be applied to almost all search applications.

Figure 1 shows the relationship between the search application and Lucene, and also reflects the process of building a search application using Lucene:

Figure 1. Search for the relationship between the application and Lucene

Index and search

Indexing is the core of modern search engines. The indexing process is to process the source data into an index file that is very convenient to query. Why is indexing so important? Imagine that you want to search for documents containing a keyword in a large number of documents. If you do not create an index, you need to read these documents into the memory in sequence, then, check whether this article contains the keywords to be searched. In this case, it will take a lot of time. Think about the search engine's search results within milliseconds. This is because an index is created. You can think of an index as a data structure that allows you to quickly and randomly access the keywords stored in the index, then find the document associated with the keyword. Lucene adopts an inverted index mechanism. Reverse indexing means that we maintain a word/phrase table. For each word/phrase in this table, a linked list describes which documents contain the word/phrase. In this way, you can quickly obtain search results when entering query conditions. We will introduce the indexing mechanism of Lucene in the second part of this series. Because Lucene provides a simple and easy-to-use API, therefore, even if you are not familiar with the full text indexing mechanism at the beginning, you can easily use Lucene to index your documents.

After you have created an index for the document, you can search for these indexes. The search engine first parses the search keywords, then searches for the created indexes, and finally returns the documents associated with the keywords entered by the user.

It's not fun yet. Let's take a look at more related content about Lucene in the split line:

-------------------------------------- Split line --------------------------------------

Indexing and searching based on Lucene multi-index

Lucene (version 2nd) Chinese edition supporting source code

Lucene (version 2nd) PDF

Use Lucene-Spatial to implement full-text retrieval of integrated geographical locations

Lucene + Hadoop distributed search runtime framework Nut 1.0a9

Lucene + Hadoop distributed search runtime framework Nut 1.0a8

Lucene + Hadoop distributed search runtime framework Nut 1.0a7

Project 2-1: Configure Lucene and create a WEB query system [Ubuntu 10.10]

-------------------------------------- Split line --------------------------------------

Lucene details: click here
Lucene: click here

Solr Introduction

Solr is a Lucene-based Java search engine server. Solr provides hierarchical search, eye-catching hit display, and multiple output formats (including XML/XSLT and JSON ). It is easy to install and configure, and comes with an HTTP-based management interface. Solr has been used in many large websites and is relatively mature and stable. Solr encapsulates and extends Lucene, so Solr basically follows the related terms of Lucene. More importantly, the index created by Solr is fully compatible with the Lucene search engine library. By configuring Solr appropriately, encoding may be required in some cases. Solr can read and use indexes built into other Lucene applications. In addition, many Lucene tools (such as Nutch and Luke) can also use the index created by Solr.

Solr3.6.1 build an environment in Tomcat6

Tomcat-based Solr3.5 cluster deployment

Load Balancing for Solr clusters using Nginx on Linux

Install and use Solr in Linux

Deploy Solr 4 on Ubuntu 12.04 LTS through Tomcat

Solr implements Low Level query parsing (QParser)

Build a search Server Based on Solr 3.5

Solr 3.5 development and application tutorial PDF

Solr 4.0 deployment instance tutorial

Solr details: click here
Solr: click here

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More