New Feature of RavenDB3.0: Index backend

Source: Internet
Author: User
RavenDB indexing is definitely not a simple storage of keyvalues, and its functions are much more powerful. Just like other features of version 3.0, It is the crystallization of sweat and wisdom. This article I

RavenDB indexing is definitely not a simple storage of key/value, and its functions are much more powerful. Just like other features of version 3.0, It is the crystallization of sweat and wisdom. This article I

RavenDB indexing is definitely not a simple storage of key/value, and its functions are much more powerful. Just like other features of version 3.0, It is the crystallization of sweat and wisdom. This article mainly introduces the changes to the index in the backend to make it faster, more stable, and better performance. The new features that users can see will be mentioned in the next article.

The index in the memory. The history proves that we can say goodbye to the system optimization tool only by starting with the hard disk. To improve the data read/write speed of creating new indexes, the new concept of creating new indexes only in memory is introduced in version 2.5. in 3.0, this function was further improved. the index data is changed from frequent read/write to memory buffer. the index data is written to the hard disk only in some special circumstances (such as insufficient memory.

In this way, you can greatly reduce the time required to read and write index data, and maintain and optimize the hard disk Time. without these limitations, even under high load, it can maintain excellent performance. in daily use, occasional fluctuations in load will not cause hard disk problems.

Asynchronously deletes an index. the index in RavenDB contains two parts: actual data and metadata. in general, the metadata is less than the actual data. however, for map/reduce indexes, the opposite is true, because its metadata contains a lot of data related to intermediate steps. if you use LoadDocument in a large-scale database, we also need to maintain the reference of the document, which requires a lot of storage space. as a result, the process of deleting the index in RavenDB 2.5 becomes extremely slow.

In RavenDB 3.0, with the emergence of asynchronous index deletion, you can quickly delete the index. on the surface, the index is deleted, but the index name is deleted. Other cleanup work is left to the background for asynchronous processing. don't worry if you need to restart the database halfway, the unfinished cleaning work will continue in the background after the database is started. this asynchronous deletion method makes it easy to maintain and delete indexes containing large amounts of data.

Asynchronously deletes an index. the index in RavenDB contains two parts: actual data and metadata. in general, the metadata is less than the actual data. however, for map/reduce indexes, the opposite is true, because its metadata contains a lot of data related to intermediate steps. if you use LoadDocument in a large-scale database, we also need to maintain the reference of the document, which requires a lot of storage space. as a result, the process of deleting the index in RavenDB 2.5 becomes extremely slow.

In RavenDB 3.0, with the emergence of asynchronous index deletion, you can quickly delete the index. on the surface, the index is deleted, but the index name is deleted. Other cleanup work is left to the background for asynchronous processing. don't worry if you need to restart the database halfway, the unfinished cleaning work will continue in the background after the database is started. this asynchronous deletion method makes it easy to maintain and delete indexes containing large amounts of data.

The index and task are executed alternately. for RavenDB, the term "task" basically refers to clearing index data. for example, clear the deleted index records or re-index the referenced documents that have changed. in version 2.5, these tasks are queued and are waiting for execution in the queue table. As a result, many index tasks are not executed in time. for example, a large number of index deletion tasks are waiting in the queue every day, and it takes a lot of time to execute such tasks. in 3.0, we made some adjustments, and the index and task execution alternate. No matter how full the queue is, it will not have a big impact on the index.

Large document index. ravenDB has no limit on the document size, which is a good thing for users. But if RavenDB wants to index these documents, it will be Alexander. suppose we want to index a lot of documents. we will increase the number of indexes in each batch. as the system and documents become larger and larger, problems begin to emerge. many documents will become much larger after Index Update. for example, if each batch processes 128 K documents, and each document is kb, it means that each batch needs to index 31 GB of documents.

It takes some time for such a large amount of data to be read from the disk. This does not include the memory read/write time. users usually compress Big Data parts. this will cause the problem to become more serious. because RavenDB only reads the file size on the disk, that is, the compressed file size. the results can be imagined. in 3.0, we took some preventive measures against this problem. the first is to calculate the document size in the content, and also better limit the amount of memory for each batch operation.

Batch index restricted by I/O. A core solution of RavenDB is to run on ECS. but in fact, our customers use a variety of servers. from i2.8xlarge EC2 (32-core, 244 GB memory, 8x800 gb ssd hard drive) to A0 Azure (shared CPU, 768 MB memory, hard drive failure, tears) all have. because we only use about 1/4 of the available resources on the server. the customer is always complaining about why the remaining resources are not used. the problem is that their algorithms used to calculate available resources are different from those used by RavenDB. If there is no complaint about the performance, the problem is that RavenDB does not use resources effectively.

It looks funny, but it is not. low-end ECs instances are slow in speed and have poor performance. in particular, the I/O transmission rate is quite slow. if you create an index for a database in use on such a server, you will find that most of the time is used for I/O operations. over time, this problem will become more and more serious. ravenDB reads a small amount of data from the hard disk for batch indexing at the beginning (for example, reading data from the hard disk in half a second ). then the next batch and the next batch will be processed in such a batch. when RavenDB finds that there is too much data to be processed, it will increase the number of processes in each batch. as a result, it takes longer and longer to wait for the data to be read from the hard disk. in the network management perspective, RavenDB is basically stuck there, and nothing is done.

In RavenDB 3.0, we no longer struggle with the I/O speed issue. read part of the data from the hard disk first. If you still cannot read enough data within a reasonable period of time, we will first index the read data, at the same time, the Data Reading task is placed in the background for further execution. after the index is executed, You Can index the part of the data read in the background. this can greatly improve the performance. (the customer can see that indexing and read/write operations are performed by colleagues and will not blame our software for being idle)

Conclusion-these new features are basically running in the background, and users cannot see any changes on the foreground, but they can coordinate and cooperate to bring a better user experience.

What is New in RavenDB 3.0: Indexing Backend

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.