SOLR fixed-time full index incremental index and dataimporter.properties configuration

Source: Internet
Author: User
Tags commit current time html tags solr time interval server port
SOLR Incremental Index configuration
1. Before you perform an incremental index, you must first understand a few of the necessary attributes, as well as database-building-list matters, and dataimporter.properties

data-config.xml Data <!--transformer format conversion: HTML tags are ignored in the Htmlstriptransformer index--->
<!--query: Querying database tables to match record data--->
<!--deltaquery: Incremental index query PRIMARY key ID---> Note this only returns the ID field
<!--deltaimportquery: Incremental index query import data--->
<!--deletedpkquery: Incremental Index Delete primary key ID Query---> Note This only returns the ID field


Database Configuration Considerations 1. If only to add, and modify the business, then only one additional Timpstamp field in the database can be, the default value is the current system time, Current_timestamp (the author's data for MySQL)
2. If the deletion of the business is also involved, then the data will need to add more than one additional field Isdelete,int type 0, to identify, whether this record is deleted, of course, can also be identified with other fields, ture or false can be

dataimporter.properties

This configuration file is important, it is used to record the current time with the last modification time, through which it can find, those, newly added, modified, or deleted records


The following is the author at the time of the test when a demo, which add, modify, delete, are involved in

    <dataConfig> <!---This session is configured with a MySQL data source, (the data source can also be configured in Solrconfig.xml)---> <datasource n Ame= "MyDB" type= "Jdbcdatasource" driver= "Com.mysql.jdbc.Driver" url= "Jdbc:mysql://localhost/test" user= "root" password= "Ninemax"/> <document> <!--below to describe the properties (if there is an error, please indicate)-<!--pk= "ID" This is necessary, because To query for a primary key ID for an incremental index, <!--datasource= "MyDB" refers to the name of the data source above--<!--name= the name "MyInfo" must be One, there are multiple entities--<!--query= "SELECT * from MyInfo WHERE isdelete=0 query queries refer to all eligible data in the table, Because the author tests the deletion of the business, so where there is a qualification isdelete=0, meaning that the query is not deleted data (note that the query queries only for the first full-volume import has a role, the incremental import does not work)--&gt
			   ; <!--deltaquery= "Select ID from MyInfo where My_date > ' ${dataimporter.last_index_time} '" Deltaquery means The idea is that the ID of all the modified records could be a modification, an add operation, a delete operation (this query only works on an incremental import, and can only return ID values)-<!--Deletedpkque ry= "Select ID from MyInfo WHEre isdelete=1 "This action value queries the ID of the pseudo-deleted data in those databases (that is, the data isdelete is identified as 1) SOLR deletes the corresponding data within the index (this query only works on the incremental import and only returns the ID value) -<!--deltaimportquery= "SELECT * from MyInfo where id= ' ${dataimporter.delta.id} '" is to get the two steps above
			   
			   
			 
		     ID, then takes all of its data, updates the index library based on the obtained data, may be deleted, added, modified (this query only works on incremental import, can return multiple field values, in general, columns that return all fields)-- <entity pk= "ID" datasource= "MyDB" name= "MyInfo" query= "select * from MyInfo WHERE isdelete=0" D eltaquery= "Select ID from MyInfo where My_date > ' ${dataimporter.last_index_time} '" Deletedpkquery= "sel 
			  
			 ECT ID from MyInfo where isdelete=1 "deltaimportquery=" select * from MyInfo where id= ' ${dataimporter.delta.id} ' "
		       > <!--This record is necessary to say that the ID is specified in uppercase and corresponds to the above statement----> <field column= "id" name= "id"/> <field column= "name" name= "name"/> <field column= "Address" name= "Address"/> <field column= "AG
			  E "name=" Age "/>  <field column= "My_date" name= "my_date"/> <field column= "Isdelete" name= "Isdelete"/> </entity  
    > </document> </dataConfig>

dataimporter.properties Configuration

Reference: Official documents: http://wiki.apache.org/solr/DataImportHandler#Scheduling


Googlecode found: https://code.google.com/p/solr-dataimport-scheduler/


1. Copy the Solr-4.2.11\solr-4.2.1\dist directory under Solr-dataimporthandler-4.2.1.jar and Solr-dataimporthandler-extras-4.2.1.jar to


Under the D:\program\tomcat6\webapps\solr\WEB-INF\lib directory


2. From Https://code.google.com/p/solr-dataimport-scheduler/downloads/list Download Apache-solr-dataimportscheduler-1.0-with-source.jar to


Under the D:\program\tomcat6\webapps\solr\WEB-INF\lib directory


3. Remove the dataimport.properties in the Apache-solr-dataimportscheduler-1.0-with-source.jar to D:\PROGRAM\TOMCAT6\SOLRAPP\SOLR \conf


Conf folder is not, to create a new


4. Modify D:\program\tomcat6\webapps\solr\WEB-INF\web.xml, join

<listener>
             <listener-class>org.apache.solr.handler.dataimport.scheduler.applicationlistener </listener-class>
   </listener>

5. Modify Dataimport.properties Content:

################################################# # # # DataImport Sch Eduler Properties # # # ############################################ ##### # to-sync or not-to-sync # 1-active; Anything else-inactive syncenabled=1 # which cores to schedule # in a multi-core environment can decide which cor Es want syncronized # leave empty or comment it out if using Single-core deployment #syncCores =game,resource Synccore
S=collection1 # SOLR Server name or IP address # [defaults to localhost if empty] server=localhost # SOLR Server port # [defaults to + if empty] port=8080 # application Name/context # [defaults to current Servletcontextlistener ' s Conte XT (APP) name] WEBAPP=SOLR # URL params [mandatory] # remainder of URL Params=/dataimport?command=delta-import&clea N=false&commit=true # Schedule Interval # Number of minutes between and runs # [defaults to If empty] interval=1 # Redo Index time interval, unit minutes, default 7200, i.e. 1 days; # is empty, 0, or commented out: The parameter that never re-indexes rebuildindexinterval=2 # Redo Index rebuildindexparams=/dataimport?command=full-import&  Clean=true&commit=true # The timing start time of the redo index interval, the first real execution time =rebuildindexbegintime+rebuildindexinterval*60*1000; Two formats: 2012-04-11 03:10:00 or 03:10:00, the latter one automatically complements the date part as the date when the service was started rebuildindexbegintime=03:10:00


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.