Full-Text Indexing----SOLR Server update full-scale indexes

Source: Internet
Author: User
Tags solr

after the SOLR indexing is set up, the indexes need to be updated in a timely manner based on changes in the database, and there are two ways to update the indexes, full updates and incremental updates. As the name implies, a full-volume update deletes all indexes on the SOLR server and then re-imports the data, and the incremental index updates only the modified data, and this article describes the full-scale index update.
A configuration data source
1.1 Database

We use a single table as the test data source, including three fields, Id,title,content, easy to test, using varchar as the primary key data type. The structure is as follows:


1.2 Configuring Data-config.xml

The data source configuration content is as follows:

<pre name= "code" class= "HTML" ><span style= "font-size:18px;" ><dataConfig>    <datasource name= "Jfinal_demo" type= "Jdbcdatasource" driver= "Com.mysql.jdbc.Driver" Url= "Jdbc:mysql://192.168.21.20:3306/jfinal_demo" user= "root" password= "123456" batchsize= "-1"/> <document Name= "Testdoc" ><entity name= "blog" datasource= "Jfinal_demo" pk= "id" query= "SELECT * From Blog" ><field column= "id" name= "id"/><field column= "title" Name= "title"/><field column= "content" name= "content"/> </entity>    </document></dataConfig></span>

1.3 Configuring Schema.xml

The index file is configured as follows:

<span style= "FONT-SIZE:18PX;" ><pre name= "code" class= "HTML" ><field name= "id" type= "text_general" indexed= "true" stored= "true"/> <field name= "title" Type= "Text_general" indexed= "true" stored= "true"/><field name= "content" type= "Text_ General "indexed=" true "stored=" true "/></span>

II using the SOLR Admin client to update the index

The 2.1 update operations are as follows:

2.2 Testing

Description: Use SOLR Admin client mode, simple, fast, intuitive, suitable for data testing.

Three update indexes with HTTP requests

3.1 Principle we know that all SOLR operations will eventually be converted to an HTTP GET request to access the server, so we can imitate the client to update the index directly through an HTTP request.

3.2 implementation

This article uses the HttpURLConnection object to complete the HTTP request with the following code:

<span style= "FONT-SIZE:18PX;" ><span style= "White-space:pre" ></span>/** * Access URL, full index */public static Boolean Runhttpget () {Boolean          Flag = false;//Sets the path of the request String strurl= "Http://192.168.22.216:8983/solr/dataimport?command=full-import";              The requested parameter is UTF-8 encoded and converted to a byte array = try {//Create a URL object url url=new url (strurl);              Open a HttpURLConnection connection httpurlconnection urlconn= (httpurlconnection) url.openconnection ();              Sets the time of the connection timeout urlconn.setdooutput (true);              When using a POST request, the settings cannot use the cache Urlconn.setusecaches (false);              Set the request for POST request Urlconn.setrequestmethod ("GET");              Urlconn.setinstancefollowredirects (TRUE);              Configuration Request Content-type Urlconn.setrequestproperty ("Content-type", "Application/json, Text/javascript");              Perform connection operation Urlconn.connect (); Send the requested parameter DataOutputStream Dos=new DataOutputStream (Urlconn.getoutputstream ());            Dos.flush ();                        Dos.close ();                if (Urlconn.getresponsecode () ==200) {flag = true;                Show InputStreamReader ISR = new InputStreamReader (Urlconn.getinputstream (), "utf-8");                     int i;                     String strresult = "";                Read while ((i = Isr.read ())! =-1) {strresult = strresult + (char) i;                }//system.out.println (Strresult.tostring ());                 Isr.close ();          }} catch (Exception e) {e.printstacktrace ();    } return flag; }</span>
   

of course, we in the actual use, will be timed to complete the index update, so we can do task scheduling through quartz, here no longer demonstrates code, interested readers can be completed according to the actual situation.

3.3 Testing


Description: This method is simple, logic clear, but the implementation is slightly complex, and requires additional programs and resources to implement the function, so want to use this method of children's shoes need to be mentally prepared.
Four use official scheduler to implement index update
4.1 Overview
SOLR officially provides a powerful data Import request Handler, while providing the scheduler, the example scheduler only support incremental index, does not support the regular full-scale index, the user has been modified to increase the full-scale index timer. I am here only to do the introduction, blog address:
The references are as follows:
4.2 Jar Package Configuration

Bring Apache-solr-dataimportscheduler-1.0.jar and SOLR to the Apache-solr-dataimporthandler-.jar, Apache-solr-dataimporthandler-extras-.jar put it under the Solr.war Lib directory.

4.3 Configuring Web. xml

Modify the Web. XML in Solr.war to increase the money surface at the servlet node:
<span style= "FONT-SIZE:18PX;" >    <listener>        <listener-class>            Org.apache.solr.handler.dataimport.scheduler.ApplicationListener        </listener-class>    </listener ></span>

4.4 Configuring the index update file

Remove the dataimport.properties from the Apache-solr-dataimportscheduler-.jar and modify it according to the actual situation and put it on solr.home/conf (not solr.home/core/ conf) directory below

Dataimport.properties Configuration Item Description:

<span style= "FONT-SIZE:18PX;" >##################################################                                                  ##       DataImport Scheduler properties          ##                                                 ###################################### ############ #  to sync or not to sync#  1-active; Anything else-inactivesyncenabled=1 #  which cores to schedule#  in a multi-core environment can deci De which CORes want syncronized#  leave empty or comment it out if using Single-core deploymentsynccores=core1,core2 #& nbsp SOLR server name or IP address#  [defaults to localhost if empty]server=localhost #  SOLR server port#&nbsp ; [Defaults to + if empty]port=8080 #  application name/context#  [defaults to current Servletcontextlistener ' s context (APP) name]webapp=solr #  URL params [mandatory]#  remainder of urlparams=/dataimport?command=delta-import&clean=false&commit=true #  Schedule interval#  Number of minutes between-runs#  [defaults to If empty]interval=1 #  re-indexing interval, per minute, default 7200, 5 days; #  is empty, 0, or commented out: A parameter that will never be re-indexed rebuildindexinterval=7200 #  the index Rebuildindexparams=/dataimport?command =full-import&clean=true&commit=true #  The timing start time of the redo index interval, the first real execution time =rebuildindexbegintime+ rebuildindexinterval*60*1000;#  two formats: 2012-04-11 03:10:00 or   03:10:00, the latter one will automaticallyThe completion date section is the date when the service was started rebuildindexbegintime=03:10:00</span> 

4.5 Restarting the SOLR server

4.6 The third method is simpler and does not require additional program support, so it is recommended.

Five summary

The whole index is straightforward, but when the data volume is large, the system needs to consume too much IO resources, so it is necessary to set the update interval of the index large, which may cause the data to be out of sync in a short time, but this will affect the user experience, the time-sensitive system does not recommend the use of full-scale index method.

Full-Text Indexing----SOLR Server update full-scale indexes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.