SOLR Simple Database data synchronization (to be continued)

Source: Internet
Author: User
Tags solr

Originally in other companies responsible for the document retrieval module maintenance (meaning is not I developed it). So we have a little contact and research on the following document retrieval.

Document retrieval is a full-text search, which is a technique to classify n multiple documents in a regular way, then create an easy-to-search indexed file, and then search for documents with some regularity, by quickly locating the index, and then locating the document precisely based on the information provided by the index, so that the document can be found quickly. This document generally becomes an entry.

the company's use of Lucene plus Zoie implementation. Lucene is an open source project under Apache, but it is not an implementation of full-text search, but rather a full-text search engine, a schema, and the underlying support for other retrieval services. Zoie research is not too much, because it is not very useful to think. For the basic knowledge and use of Lucene, in the future to write a separate blog record, the paper mainly records SOLR simple construction and application .

SOLR It is an open source, Lucene Java-based search server that is easy to add to a WEB application. SOLR provides a level of search (that is, statistics), hit highlighting, and supports multiple output formats (including XML/XSLT and JSON formats). It is easy to install and configure, and comes with an HTTP-based management interface. You can use SOLR's excellent basic search functionality, or expand it to meet the needs of your business . This means that SOLR can do its own service directly. Just like the search service package developed by the company itself, encapsulates some lucene operations (mostly write index IndexWriter) and some Zoie operations (mainly read operations Indexreader), Using the simplest of its own word segmentation method StandardAnalyzer, so it is not easy to use. SOLR is an already packaged war package that allows HTTP access and can be configured for document formats, fields, index creation, search, and so on. Is it possible to say basic perfect!

Come down and simply say my research steps:

1. Download SOLR. My notebook is a Windows system, so I downloaded the zip package.

2. Unpack the SOLR Zip package, unzip the directory structure

In the bin directory are some scripts

Contrib is a jar package for some extensions that are used by SOLR services to reference and add advanced features. The configuration of the word breaker, the introduction of database data, Data View schematic analysis (XML, JSON, etc.).

Dist is SOLR's own bunch of jar packages and the Java Client SORLJ dependency package

Docs is the help document, very detailed

Example is the instance configuration, jetty configuration, SOLR core configuration

Licenses is the certification information, no tube.


3. This time directly explains how to synchronize the database data into the index.

01. Copy the Collection1 folder in EXAMPLE/SOLR to the same directory, rename it to user, delete the core.properties and README.txt under the user folder, and empty the User/data folder.
The 02.user folder is our new index library. The Conf folder below is an index of some configuration files, and the Data folder is the index file created after initialization
03.conf is described below:
. Clustering folder configuration cluster (pending research)
The. lang folder is configured for the stopwords of the national languages
The. Velocity folder is a return format for configuring VMS, using/browse to
The. Xlst folder is a configuration of the XML data format
External file parsing:
Stopwords.txt Filter Words
Protwords.txt have protective words (not quite understood)
Synonyms.txt synonyms
Spellings.txt spelling checker Document
Elevate.xml configuration ranking of the rising field
Solrconfig.xml is the main configuration file for SOLR, configuring jar packages, path information, creating index configuration, updatehandler configuration, query configuration, requesthandler configuration, some presentation page configurations, Data source Configuration (Dataimporthandler), Facet display page configuration
Schema.xml is the field configuration file for Solr indexes. Configure field, FieldType, etc.
04. Modify the Solrconfig.xml file:
Introduce the jar packages you need
<lib dir= ". /bin/"regex=" Mysql-connector-java-5.0.8-bin.jar "/>
<!--analysis Libs by Tianzhilong--
<lib dir= ". /.. /.. /contrib/analysis-extras/lib "regex=". *\.jar "/>


Configure query criteria, Velocityresponsewriter, facet display pages for/browse access. The main need is to modify the QF in Query settings (set the lookup fields and the weights of each field); DF (default query field) and so on query parameters visible http://sarsgetaway.iteye.com/blog/ 1560143;faceting Setting main Settings field, query, range; Highlight settings; spell check settings


Join/dataimport path for data synchronization
<!--Dataimporthandler to being registered in the Solrconfig.xml-
<requesthandler name= "/dataimport" class= "Org.apache.solr.handler.dataimport.DataImportHandler" >
<lst name= "Defaults" >
<str name= "config" >data-config.xml</str>
</lst>
</requestHandler>
05. Modify the Schema.xml file
The database user table needs to be stored and indexed fields are configured into Schema.xml, the original test field is deleted;
DynamicField not change;
Copyfield add its own fields (indexed fields, only those fields that do not index);
The types of Chinese participle added in FieldType
<fieldtype name= "TEXT_SMARTCN" class= "SOLR. TextField "positionincrementgap=" 0 ">
<analyzer type= "Index" >
<tokenizer class= "Org.apache.lucene.analysis.cn.smart.SmartChineseSentenceTokenizerFactory"/>
<filter class= "Org.apache.lucene.analysis.cn.smart.SmartChineseWordTokenFilterFactory"/>
</analyzer>
<analyzer type= "Query" >
<tokenizer class= "Org.apache.lucene.analysis.cn.smart.SmartChineseSentenceTokenizerFactory"/>
<filter class= "Org.apache.lucene.analysis.cn.smart.SmartChineseWordTokenFilterFactory"/>
</analyzer>
</fieldType>
06. Under the Velocity folder, modify the Product_doc.vm file. Set the fields you want to display on the


Time is limited, down again typesetting and completion. To be Continued ~

SOLR Simple Database data synchronization (to be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.