SOLR Incremental Index--delete business, scheduled incremental index

Source: Internet
Author: User
Tags solr

Ok, I wrote the SOLR content again, yesterday made a change and increment of the incremental index, today say about the deletion of the incremental index and the scheduled update of the incremental index, nonsense said, the following into the text.

One, the deletion of incremental index

As I said yesterday, the incremental index is actually the index of the data that the database has changed since the last time the index was done (incremental or full), and we said yesterday that adding a piece of data or modifying a piece of data creates an incremental index, and now deletes the incremental index of the data.

In fact, the deletion is false delete, what does it mean? This is not to say that some data is completely erased in the database, but rather to give you an identifier for the data you don't want to index, and then tell SOLR that there's data for that identifier you don't have to create an index for me, I don't need it, and when SOLR creates the index, it ignores the data with the special identifiers. The approximate principle is this, then how to achieve it? See below:

1.1 Database new fields

The new incremental index was added yesterday when you need to add a field in the database, delete today, or need a field, and then use this field to indicate whether the data need to create an index, the fields are as follows:

  

Field name you can see for yourself, type to an int on the line, as to the length is also random, my is 0 to create the index of data, 1 means no need to create the index of data, that is, I said the fake deleted data.

1.2 Modifying a configuration file

Again, this field needs to be configured in the Data-config.xml and Schema.xml files as follows:

Data-config.xml

  

Note that circled three places, first of all, it is necessary to change the Isdelete field into a file tag, and secondly, the query statement needs to add a where condition, query out all the data in the database to create the index, and then add a The Deletedpkquery statement, which is the same as the deltaquery and the Deltaimportquery statement, is only when the incremental index is done, Deletedpkquery is to query out all the false deleted data ID, The deltaimportquery then queries all the data for the records that correspond to those IDs, and then removes the part of the index from the index that has already been created.

Schema.xml

There is no big change in the Schema.xml file, just add the Isdelete field:

  

1.3 Viewing effects

Follow the configuration above, then let's look at the effect, first of all the database:

Or yesterday's 17 data.

SOLR has created a good index:

The index is also 17, and the data of the database is corresponding, below I have two of the data in the Database Isdelet field modified to 1, and then create a Delta index, the same way as yesterday, I was no longer, first or modify the database:

Change the isdelete of the elder brother and Tony Leung to 1, after performing the incremental index, the result is as follows:

You can see that at this time the index is 2 fewer, so what is less is not I become 1 of the two people? We can look it up and we can't find it right:

Can see at this time to find Chow Yun-fat, but only found the star ye, this reason or the problem of the word breaker, yesterday has said, and then check the following Tony Leung:

You can see that the search results are empty.

The two queries above indicate that we succeeded in the incremental indexing of false deletions.

Second, timed incremental index

If every database changes two data we get HTTP://LOCALHOST:8080/SOLR here to do an incremental index, that's not very troublesome? So SOLR provides the function of timed tasks, of course you can also integrate yourself, such as with spring scheduled tasks, or integration quartz these, timed to execute the incremental index of the URL, but also can achieve the same purpose, but we are not talking about this today, the following begins in detail.

First of all, we need to introduce a jar package, this jar package I have sent in the first article at the end of the demo, after the decompression can be seen, but I would like to say, I issued a jar is a modified source of the jar package, many places to give the jar package is Apache-solr-dataimportscheduler-1.0.jar This jar package, is http://code.google.com/p/solr-dataimport-scheduler/ Downloads/list, but this jar package will go wrong after putting it into the Lib package of the SOLR project under Tomcat, in fact it is not an error, but the following problem occurs:

HTTP request has been returned 415, prompted not supported by the media type , this issue on the Saturday toss me the time of day, quickly depressed dead, and later on the internet to look at the information when someone else's article, also to solve, I will put that article address also sent out, What is the reason for this? The reason is that this jar of a class in the Send HTTP request is the use of post, but we send the request here is get mode, so it has been 415, very egg pain has no, so need to enter the jar package, modify the source code to work properly, I read the article address is: HTTP ://blog.csdn.net/zwx19921215/article/details/43152307, the inside is very detailed, there is another problem, you can take a look, OK, put this jar package into the SOLR Lib package for the next step.

The second step is to add the following code to the SOLR Web. xml file:

  

The third step, unzip the Apache-solr-dataimportscheduler-1.0.jar file, from the extracted folder to find the dataimport.properties file, Then copy the file into the conf folder in your Solr_home directory ,

Note that this Conf folder is not solr_home\collection1 under the Conf, but the Solr_home\conf folder, which previously did not exist and needs to be created by you. ,

The fourth step, open the dataimport.properties file, modify the contents of the file, the modified file content is as follows:

  

################################################## # # DataImport S Cheduler Properties # # ############################################# ###### to sync or not to sync#1-active; AnythingElse-inactivesyncenabled=1# which cores to schedule# in a multi-Core environment you can decide which cores your want syncronized# leave empty or comment it outifUsing single-Core Deploymentsynccores=collection1# SOLR server name or IP address# [defaults to localhostifEmpty]server=localhost# SOLR Server port# [Defaults to80ifEmpty]port=8080# application Name/context# [Defaults to current Servletcontextlistener' s context (APP) name]webapp=solr# URL Params [mandatory]# remainder of Urlparams=/dataimport?command=delta-Import&clean=false&commit=true&wt=json&optimize=false# Schedule interval# number of minutes between, runs# [defaults to30ifEmpty]interval=1# The time interval of the redo index, in minutes, by default 7200, i.e. 1 days; # is empty, 0, or commented out: Indicates never re-indexing Rebuildindexinterval=7200# parameter Rebuildindexparams for redo index=/dataimport?command=full-Import&clean=true&commit=true# The timing start time of the redo index interval, the first real execution time=rebuildindexbegintime+rebuildindexinterval*60*1000; # Two types of formats:2012-04-11 03:10:00 or 03:10:00, the latter one will automatically complete the date part as the date when the service was started Rebuildindexbegintime=03:10:00

Attention:

    1.synccores=collection1 means to create an index on the core of Collection1, if not set, the default is to create an index to Collection1, if used multicore, then use a comma separated.

2.server=localhost,port=8080 change to your own container address and port number;

3.interval=1 represents the time interval for a timed incremental index, in minutes;

4. Other in accordance with the above note configuration, there is nothing difficult to understand;

OK, after following the configuration above, start Tomcat, after 1 minutes, you can see the following information, then the scheduled incremental index is successful:

  

Come here today, writing documents really is a very time-consuming thing, 8:30 began to write, a write and one hours, for the time being so, tomorrow on the word breaker bar, and then may add multicore, but not necessarily, tomorrow.

  

SOLR Incremental Index--delete business, scheduled incremental index

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.