Lucene concurrency security and lock

Source: Internet
Author: User

This section describes three closely related topics: concurrent access to index files, thread security of IndexReader and IndexWriter, and the lock mechanism Lucene uses to avoid index corruption. Generally, beginners of Lucene have some misunderstandings about these topics. It is very important to understand the content accurately, because when the indexing application serves a large number of different users at the same time, or when you need to perform Parallel Processing on some operations to satisfy some sudden requests, this content will help you eliminate the questions you encounter when building an application.

2.9.1 concurrent access rules

Lucene provides some ways to modify indexes, such as index new documents, update documents, and delete documents. To avoid damage to index files, specific rules must be followed. Such problems are usually highlighted in web applications. Because web applications serve multiple requests at the same time. Although Lucene's concurrency rules are relatively simple, we must strictly abide by the following rules:

-Any number of read-only operations can be performed simultaneously. For example, multiple threads or processes can search for the same index in parallel.

-When the index is being modified, we can perform any number of read-only operations at the same time. For example, if an index file is being optimized or the document is being added, updated, or deleted, you can still search for the index.

-At a certain time point, only one index modification operation is allowed. That is to say, at the same time, an index file can only be opened by one IndexWriter or IndexReader object.

Based on the preceding concurrency rules, we can construct more comprehensive examples of concurrency, as shown in Table 2.2. The table shows whether to allow concurrent operations on an index file.

Example of whether Table 2.2 allows concurrent operations on a Lucene Index

Operation

Allowed or not

Run multiple parallel search processes on the same index

Allow

When multiple parallel search processes are running on an index that is being generated, optimized, or merged with another index, or the index is being deleted or updated to a document, run multiple parallel search processes on the index

Allow

Add or update a document using multiple IndexWriter objects for the same index

Not Allowed

When an IndexReader object for deleting a document from an index is not closed successfully, an IndexWriter object is opened to add a new document to this index.

Not Allowed

After the IndexWriter object adds a new document to the index, it is not closed. After that, an IndexReader object is opened to delete the document from the index.

Not Allowed

Note: When an index is being modified, remember that only one modification operation can be performed on the same index at a certain time point.

2.9.2 thread security

Although Lucene does not allow multiple indexwriter or indexreader instances to modify an index at the same time, as shown in Table 2.2, both classes are thread-safe, it is important to understand this. Therefore, instances of these two classes can be shared by multiple threads. Lucene will perform proper synchronization for the calls of all the methods for modifying indexes in each thread, in this way, the modification operation can be performed sequentially. Figure 2.7 describes a scenario like this:

Figure 2.7 an indexwriter or indexreader object can be shared by multiple threads

The application does not need to perform additional synchronization. Although both indexreader and indexwriter are thread-safe, Lucene applications must ensure that the objects of these two classes cannot overlap the modification operations on indexes. That is to say, before using the indexwriter object to add a new document to an index, you must disable all indexreader instances that have completed the deletion operation on the same index. Similarly, before the indexreader object deletes or updates the documents in the index, you must disable the indexwriter instance that has previously opened the index.

Table 2.3 is a matrix of concurrent operations. It shows whether specific operations can be executed concurrently. This table assumes that only one indexwriter or indexreader instance is used by the application. Note that the index update is not listed as a separate operation, because it can be viewed as a deletion operation before being added.

Table 2.3 uses the concurrency operation matrix when the same indexwriter or indexreader instance is used. The cross section in the table indicates that the two operations cannot be executed simultaneously.

 

Search

Read document

Add

Delete

Optimization

Merge

Search

 

 

 

 

 

 

Read document

 

 

 

 

 

 

Add

 

 

 

×

 

 

Delete

 

 

×

 

×

×

Optimization

 

 

 

×

 

 

Merge

 

 

 

×

 

 

This matrix can be summarized:

-When the indexreader object deletes a document from the index, the indexwriter object cannot add a document to it.

-When the indexwriter object optimizes the index, the indexreader object cannot delete the document from it.

-When IndexWriter objects merge indexes, IndexReader objects cannot delete documents from them.

From the above matrix and its induction, we can get a usage mode: When the IndexWriter object modifies the introduced line, the IndexReader object cannot modify the index. This operation mode is symmetric: When the IndexReader object is modifying the index, the IndexWriter object cannot modify the index.

Here, readers can feel that Lucene's concurrency rules are similar to those good habits in society and reasonable regulations. We do not have to strictly abide by these Rules, but violation of these rules will have corresponding consequences. In real life, violation of laws and regulations may have to go to jail. If Lucene is used against these rules, your index file will be damaged. Lucene users may have a wrong understanding or misuse of concurrency, but Lucene creators have long anticipated this, so they try to avoid unexpected index damage caused by applications through the lock mechanism. This book will further introduce the Lucene index lock mechanism in section 2.9.3.

2.9.3 index lock mechanism

In Lucene, the lock mechanism is a subject related to concurrency. In all code segments that only allow execution of a single process at the same time, Lucene creates a file-based lock to avoid index corruption caused by the misuse of Lucene API. Each index has its own lock file set. By default, all lock files are created in the temporary directory of the computer, which is composed of Java. io. the system attribute in tmpdir is specified.

If you look at the temporary directory when indexing the document, you can see the Lucene write. lock file. When merging segments (segment), you can also see the commit. lock file. You can set the system attribute in org. apache. lucene. lockDir to change the directory where the lock file is stored to the specified location. This system attribute can be set in the program by using Java APIs, or by using command lines, such as-Dorg. apache. lucene. lockDir =/path/to/lock/dir. If multiple computers need to access the same index stored in the shared disk, explicitly set the lock directory in the program, in this way, application programs located on different computers can access lock files of each other. Based on known problems with the lock file and Network File System (NFS), the lock directory should be placed on a file system volume that does not depend on the network. The two lock files mentioned above are as follows:

 

The write. lock file is used to prevent the process from attempting to modify an index concurrently. More precisely, the IndexWriter object obtains the write. lock file during instantiation and is not released until the IndexWriter object is closed. This file is also required when the IndexReader object is deleted, restored to delete a document, or set domain specifications. Therefore, write. lock locks the index for a long time during the write operation on the index.

When reading or merging segments, you need to use the commit. lock file. Commit will be obtained before the IndexReader object reads the segment file. lock file. All index segments are named in this lock file. Lucene will release this lock file only when the IndexReader object has been opened and all segments have been read. Before creating a new segment, the IndexWriter object also needs to obtain commit. lock file, and maintain it until the object performs operations such as segment merge, and release the useless index file after it is removed. Therefore, commit. the creation of lock may be worse than that of write. lock is more frequent, but commit. lock must not lock the index for too long, because during the lifetime of the lock file, the index file can only be opened or deleted, and only a small part of the segment file is written to the disk. Table 2.4 summarizes the use of Lucene APIs to lock indexes in Lucene.

Table 2.4 operations for creating and Releasing locks in Lucene

Lock File

Class

When to get

When to release

Description

Write. lock

IndexWriter

Constructor

Close ()

Release the lock when the IndexWriter object is closed.

Write. lock

IndexReader

Delete (int)

Close ()

Release the lock when the IndexReader object is closed.

Write. lock

IndexReader

UndeleteAll (int)

Close ()

Release the lock when the IndexReader object is closed.

Write. lock

IndexReader

SetNorms (int, String, byte)

Close ()

Release the lock when the IndexReader object is closed.

Commit. lock

IndexWriter

Constructor

Constructor

Release the lock immediately after the segment information is read or written.

Commit. lock

IndexWriter

AddIndexes (IndexReader [])

AddIndexes (IndexReader [])

Lock file obtained when writing new segments

Commit. lock

IndexWriter

AddIndexes (Directory [])

AddIndexes (Directory [])

Lock file obtained when writing new segments

Commit. lock

IndexWriter

MergeSegment (int)

MergeSegments (int ))

Lock file obtained when writing new segments

Commit. lock

IndexReader

Open (Direcory)

Open (Direcory)

Get lock files after all segments are read

Commit. lock

SegmentReader

DoClose ()

DoClose ()

Segment files are written or overwritten to obtain the lock file

Commit. lock

SegmentReader

UndeleteAll ()

UndeleteAll ()

Remove the. del file and obtain the lock file.

Note the following two lock-related methods:

-IndexReader's isLocked (Directory)-This method can be used to determine whether the specified index in the parameter has been locked. Before you modify an index, you can use this method to conveniently obtain the results when the application needs to check whether the index is locked.

-IndexReader's unlock (Directory) -- this method serves as its name. Although this method can enable you to unlock any Lucene index at any time, its usage is dangerous. Because Lucene creates a lock for its own reasons, in addition, unlocking an index when it is modified may cause the index to be damaged, thus making the index invalid.

Although Lucene knows which lock files are used, when and why they are used, and where these lock files are stored in the file system, you cannot operate them directly in the file system. You should use Lucene APIs to operate on them. Otherwise, if Lucene starts to enable a different lock mechanism in the future, or Lucene changes the name or storage location of the lock file, the application may be affected and cannot be executed smoothly.

Lock instance

To demonstrate how locks are used, program 2.7 demonstrates how Lucene uses locks to prevent multiple modifications to the same index file at the same time. In the testWriteLock () method, Lucene locks an index that has been opened by the IndexWriter object and prevents the second IndexWriter object from modifying the index.

As we mentioned earlier, Lucene Beginners sometimes do not have a good understanding of the concurrency introduced in this chapter, so they are stuck in the lock issue mentioned in this section, the exception shown above occurs in the program. If similar exceptions occur in your application, and index consistency is important to users, do not ignore these exceptions. Lock-related exceptions are often a sign of misuse of Lucene APIS; if such exceptions occur in applications, they should be properly handled.

2.9.4 disable the index lock

We strongly recommend that you do not modify Lucene's lock mechanism at will, and do not ignore lock-related exceptions. However, in some cases, you may want to disable the Lucene lock mechanism without damaging the index file. For example, applications may need to access Lucene indexes stored on the CD-ROM. Because CD is a read-only medium, it means that the application operates on the index in read-only mode. In other words, this application only uses Lucene to search for indexes without any modifications to the indexes. Although Lucene has saved the lock file in the temporary directory of the system (this directory can be opened by all users of the system for write operations, however, you can still disable write by setting the system attribute disableLuceneLocks to "true. lock and commit. lock file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.