Four rumors around the memory database

Source: Internet
Author: User
Tags redis labs
In Big Data Processing scenarios, insights must be quickly collected and made decisions. Without Complex Optimization or compromise, the memory database can complete traditional data in several seconds.

In Big Data Processing scenarios, insights must be quickly collected and made decisions. Without Complex Optimization or compromise, the memory database can complete traditional data in several seconds.

Nowadays, we are in an ever-changing era, and the response time of excellent applications is usually limited to 0.1 seconds. This also means that if the acceptable network communication time is 50 milliseconds, the developer must process the data and respond within the remaining 50 milliseconds. To achieve this, there is no doubt that a millisecond-level database response time is required, especially in scenarios that support tens of thousands of requests at the same time, however, only a few databases with high flexibility and complete functions can meet this requirement.

In Big Data Processing scenarios, insights must be quickly collected and made decisions. Without Complex Optimization or compromise, in-memory databases can work in the past several hours or minutes in a few seconds. Despite this, there are still many rumors in the memory database field. Many people still think that memory databases are unreliable, inconsistent, and expensive. However, most importantly, some people think that the desired performance can be achieved by putting the database in the memory.

Rumor 1: All memory databases are fast

The answer is obviously no. Even if most of the current memory databases are written in very efficient languages, such as C and C ++, they still cannot get the required response requirements, which is mainly based on the following reasons:

1. In different databases, the complexity of processing commands is different. In high-performance databases, processing commands are executed with minimal complexity. The most direct impact is that you may need to optimize the query time as the dataset grows.

2. the query efficiency is also different. Sometimes, the database regards all the data loaded into the memory as a single BLOB (similar to the memcached cache mechanism ), this is obviously not efficient-databases should be able to distribute storage and query values, and effectively save network and memory overhead, thus significantly reducing application processing time.

3. Balance between single-line and multi-threaded architectures.

Multithreading will make full use of computing power without any processing by database users. However, this solution also requires a large amount of internal management and synchronization, consuming a large amount of computing resources. In multi-threaded mode, the lock overhead may greatly reduce the database performance.

A single thread uses a very simple execution model. In this solution, there is no lock problem, and it only consumes a little bit of computing performance, but there is no doubt that, the management of computing resources will be handed over from the database to the user. The ideal solution must be to allow users to manage resources as few as possible, because database management is a lightweight resource-intensive job.

4. Zero share vs. Share vs. share everything. Sharing affects system scalability. As the database volume continues to grow, the performance must always meet the needs of instances. The zero-share model allows all entities to exist in the form of independent units, thus avoiding the communication overhead after handling the surge and achieving linear scalability.

5. By avoiding network tasks and reducing TCP overhead, built-in acceleration components such as zero-latency distributed proxy can significantly improve database performance. In some cases, the proxy may also communicate with the database to determine whether it serves as another local client process serving the remote client on the host.

If throughput and latency are the main objective, the organization obviously needs to select a database that can achieve millisecond-level latency and minimize server requirements.

Rumor 2: memory computing is unreliable and inconsistent

Most NoSQL databases (not just memory databases) provide the client with acknowledgements (ack) before submitting data to the disk or copy ). Therefore, data inconsistency may occur.

CAP theorem indicates that no distributed computer system can have consistency, availability, and partition fault tolerance at the same time. Different databases will select different types. The specific situation is as follows: Selecting the CP model means that developers do not need to care about consistency, but writing commands in network segmentation events is not allowed. If you select the AP model, the database is always available for read and write operations. However, when writing application code, developers need to consider consistency, rather than expecting the database to complete this operation. Therefore, select an appropriate database model based on the scenario.

Rumor 3: memory computing is difficult to expand

There are two ways to scale. First, scale up the servers hosting the database, such as adding more CPU and memory, and then add more hosts to the memory cluster for horizontal scaling. In many databases, you can run multiple shards of the same dataset on the same node. Therefore, you can use more efficient computing resources to delay expansion. Similarly, the memory of multiple servers can be integrated into a shared memory pool, thus breaking the single-host memory size limit. Now, many memory databases allow the expansion of both methods at the same time, by dynamically increasing the number of core and Memory nodes allocated to the database to maximize the application response capability.

Rumor 4: memory computing is expensive

Any application that needs to increase throughput quickly faces the same problem: "How much does it cost for a certain level of throughput ". For example, in the 15 million OPS scenario, memory databases running on a single Amazon EC2 instance are cheaper than non-memory databases, however, if hundreds of servers are used to achieve the same effect, the results may be the opposite.

If the dataset size is at the TB level, the memory cost will obviously become a problem. However, there are already technologies that use flash memory to expand the memory, thus reducing the cost. However, it should be noted that the use of flash memory to expand the memory will inevitably affect the system performance. Therefore, the ideal technology here is to control the ratio of flash memory to achieve an ideal cost-effectiveness.

To sum up, selecting appropriate database technologies based on actual scenarios will greatly improve resource utilization efficiency. At the same time, the emergence of new databases for a long time, so abandon unnecessary prejudice can make the work get twice the result with half the effort.

The author Yiftach Shoolman is the co-founder and CTO of Redis Labs and has rich practical experience. Yiftach was previously the President, creator and CTO of Crescendo Networks (which was later acquired by F5), and was earlier the vice president of Native Networks technology.

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.