SOLR Data Backup

Last Update:2015-11-06 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The SOLR data backup includes the following files:

The SOLR config file includes:
1. Solr.xml, it's located under Solr_home.
2. Schema.xml, Solrconfig.xml,stopwords.txt,synonyms.txt and so on you apply the relevant configuration files, they are usually located under solr_home/conf.
Lucene index file, which is usually located under Solr_home/data. This directory contains real data, and we mainly discuss how to back up the data here.

Backup method

The backup Lucene index data file is also divided into two ways: cold and hot backup.

The so-called cold backup means to close SOLR and then copy all the data below the Solr_home/data to a secure location. This method is simple and reliable, and the index data file does not produce inconsistencies. But the drawback is obvious, is the system to shut down, if the volume of data is large, the entire replication process will be relatively long.

Fortunately, SOLR offers a very simple method of hot backup, which is the use of SOLR Replication handler! The primary role of Replication handler is to replicate the index data to each slave server in a load-balancing SOLR deployment architecture. However, even in the absence of any slave server, Replication handler can be used to create a copy of index on the primary server.

So let's take a look at how to use this feature, in fact, it is quite simple, is to send a request via a browser or a tool like curl, its URL is ' Http://master_server:port/solr/replication?command=backup '. So the backup is created. By default, a folder like "snapshot.20120812201917" is created, which includes a data backup of Lucene index. Of course, you can also specify the location of the folder, just add the location parameters in the URL, such as: &location=/foo/bar.

In the case of SOLR in alfresco, in order to back up the index of Lucene, you need to make the following request: ' Https://localhost:8443/solr/alfresco/replication?command= Backup ' (this assumes that localhost is the SOLR master server.) The alfresco in the URL represents the SOLR in the SOLR Core,alfresco system using 2 cores, one alfresco--holds the index of all valid documents in the current alfresco, and the other is archive-- The index of all deleted documents is saved). You can then find the data you backed up in ' alf_data/solr/workspace/spacesstore/snapshot.20120902041725 ' (20120902041725 for the timestamp.) Each backup will be different).

If the above replication Handler URL appears in your system "HTTP Status 404", do not worry, that is because you have not configured replication Handler. You only need to add the following configuration to the Solrconfig.xml:

<requesthandler name= "/replication" >
<lst name= "Master" >
<str name= "Conffiles" >schema.xml,stopwords.txt</str>
<str name= "Backupafter" >startup</str>
<str name= "Backupafter" >optimize</str>
</lst>
<!–
<lst name= "Slave" >
<str name= "MasterUrl" >http://localhost:8983/solr/replication</str>
<str name= "PollInterval" >00:00:60</str>
</lst>
–>
</requestHandler>

The specific meanings of each parameter see: http://wiki.apache.org/solr/SolrReplication this is no longer a statement.

A little extension here to discuss the principle of snapshot in Solr Replication Handler. Why can't we copy the Lucene index file directly in SOLR and must pass replication handler? The reason is that if you copy directly, in the process of replication, Lucene is likely to commit the operation, and the commit operation will involve the merge of the segment file, so it is easy to create inconsistencies in the data problem. So replication handler how to deal with it? The principle is that it is snapshotdeletionpolicy. Snapshotdeletionpolicy can use the last commit as a reference to take a snapshot of all the index data for the commit, and keep it until the system creates all of the backup data. This principle is illustrated in the following code:

Indexdeletionpolicy policy = new Keeponlylastcommitdeletionpolicy (); Snapshotdeletionpolicy snapshotter = new Snapshotdeletionpolicy (policy); IndexWriter writer = new IndexWriter (dir, Analyzer, snapshotter,indexwriter.maxfieldlength.unlimited); try {    Indexcommit commit = Snapshotter.snapshot ();    Collection fileNames = Commit.getfilenames ();     /*<iterate over & copy files from filenames>*/} finally {    snapshotter.release ();}

The second line of code creates a snapshot,
You can still use the Indexwrite created by the third line to update the data, but it will not affect the created snapshot.
Inside the try block, the index data corresponding to the snapshot is created.

If you are interested, you can take a look at SOLR's source code (private void Dosnapshoot (Solrparams params in Org.apache.solr.handler replicationhandler), Solrqueryresponse RSP, Solrqueryrequest req), the principle is similar to the above code.

Recovery

The recovery operation is simple, which is to copy the index data back from the previous backup. For example, copy the backup data to Solr_home/data/index and then restart SOLR to get it done.

For SOLR in alfresco, the index backup is copied back to/alf_data/solr/workspace/spacesstore/index.

Rebuilding indexes

If you don't have any backups, and the index data crashes again, you'll only have to rebuild the index if you have to. But fortunately, this job is very simple, all you need to do is to stop SOLR, then delete the index folder and other folders below Solr_home/data, and then restart SOLR, depending on your application, SOLR will automatically rebuild index (for example, Alfresco), or you manually initiate a request to rebuild index (for example, through Dih).

SOLR Data Backup

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More