In-Memory database indexes

Source: Internet
Author: User
Tags mutex

Remember the first job earlier, very proud to write a very large storage process, chat about how to reference the index, how to write SQL fast; Think about how good, easy to meet, easy happy, laugh!

Or at the end of the year, after a big project! Just a little time to study fastdb, because it was used before the commercial database was useless. Later, the problem of concurrency capability was discovered and switched to the commercial database.

If the data read and write ability, fastdb undoubtedly has a very strong competitiveness; I remember a few years ago the test FASTDB query ability is much faster than TimesTen, but the key is that fastdb in the process concurrency is slow. Because Fastdb is not a row-level lock, but a table lock + library lock, the multi-process time when the conflict increases performance can not keep up.

FASTDB has two indexes, hash index and T-tree index.

Hash index using the way of the bucket, the data and the number of barrels to a certain proportion of the new allocation of barrels to ensure that the hash result can be a more accurate matching value, so as to ensure the efficiency of the hash index. His implementation is basically the same as the hash implementation in STL, I think its shortcomings and advantages are obvious:

A, if the data does not guarantee that the key is unique, then a large number of duplicate key will lead to a straight down hash performance

b, the above shortcomings are also advantages, if the key is guaranteed to be unique. Then hash efficiency is undoubtedly a better choice of algorithms

C, hash when the new allocation of barrels, performance significantly slower. And it occupies a larger space.

I'm mainly going to record my understanding of the T-tree index today. In 11, just beginning to contact the memory database, began to search the relevant information on the Internet, it is surprising to find a large number of articles in the argument why the memory database is used t-tree and the traditional physical database is b+tree. According to the main is the following two points

A, a lot of text in the description of T-tree efficiency in theory is higher than the b-tree, although the time complexity is log2n, but that is based on the time complex worst-case, but the database index T-tree query only half of the probability of the query needs to the worst cases, And half of the time you can query the results in advance.

B, physical database selection B+tree An important reason is the swap-out of the disk, and the memory database does not exist in this requirement.

Do not know is not based on this theory, at that time I contacted a few databases are used T-tree index, TimesTen, Altibase, Fastdb, and so on several without exception.

When the year before last to try to transform the FASTDB, the database changed to row-level lock, an important part of which is the T-tree index support. Based on the above theory, I began to coding+test, for a long time did not get the desired results. One of the hardest problems I found was that the T-tree index after inserting a data, the node split has no control over the level of its parent node's influence. In this case, I can only infinite recursion to lock the parent node of the upper node, eventually locked you will find is the node, so I think this is the reason why Fastdb can only do table lock the root cause.

I am not sure how the timesten of these companies can achieve concurrency in this case unless they have a large number of loosely coupled logic to assist the index tree to complete the mutex function.

Fortunately, the company came to a Korean database team (said to be the big guy from Altibase dug out of the core team) to exchange products, the more fortunate is the leader let me to receive them, accompany them to test and exchange technology, and we used in the TimesTen carried out the benchmarking. The advantages they summarize are:

1, domestic, although the stick do. But boss is the Chinese big guy (then just set off to the IoE trend)

2, high performance, the market than the mainstream database product performance is high. And a large number of performance comparison test reports (quite professional)

3, replication agent performance is high, based on their high-speed network framework to achieve real-time data synchronization. (People who have access to products that are too high are aware that this is critical.) But I'm not sure if they're blowing.

In addition, they do not deny that one point is that the market is less commercially available, saying that there is a point of financial products in the country, mainly in Korea.

Based on what they say, we've done some testing. Based on these test data, there is some understanding of their products. The first is the concurrency performance is good, the contrast timesten is not a level at all, within 3 processes, the two are flat, but the more process number is more obvious. Then I and they came to the Korean experts (of course, there is translation, completely can't put up B) communication, they use B-tree, I asked him how not to T-tree. Bonzi experts said enough for half an hour, but the translation brothers gave me the result is: They used to be t-tree, and later changed to B-tree, they have a historical test report. As for the reason, the brother said he could not turn over. Later also DAO a lot of other, after translation brother's mouth, index mutex, replication agent what. Not much to feel (language is too important!!) )

After this exchange, began to re-examine the differences between the two trees, found that in fact, T-tree in the index update when the problem with B-tree can be completely solved. Because B-tree is split, there is no concept of rotation, and he can be sure that the parent node of the split node does not need to be changed, so that the B-tree mutually exclusive index can be implemented.

So I have another discovery, timesten11 version has changed the T-tree Index to B-tree Index, and the new memory database like SQLite is also used B-tree. But it's hard to find relevant proof articles online.

However, at that time had decided to change from T-tree to B-tree, and began a cycle of coding+test. Constant modification and testing, this little b-tree has been done for me. The problem of row-level index lock test millions of data, several process concurrency cases, 3w/s Insert, query 20w+/s. The problem still exists and the performance is poor. But can help me to verify the other database management code written, the index is really a problem, I hope we can find other inspiration later

  

In-Memory database indexes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.