Can distributed caching be used as a NoSQL database?

Source: Internet
Author: User
Keywords caching providing they caching providing they
Tags application big data cache caching data disk distributed functionality

(Author Srini penchikala translator Dingxuefeng)

For non relational data types such as documents, object graphs, and key-value pairs, the NoSQL database provides an alternative way to store data. Can distributed caching be used as a NoSQL database? Ehcache's Greg Luck author describes the similarity between distributed caching and NoSQL databases. Infoq interviewed him and discussed the pros and cons of the scheme.

InfoQ: Can you compare a distributed caching solution with a NoSQL database?

Greg Luck: Distributed caching typically puts data in memory to reduce latency. The NoSQL database is a DBMS without R (that is, a database management system without relationships) and generally lacks support for transactions and other advanced features. For systems that do not support relationships, the association of table relationships is the most troublesome part of SQL, which is the origin of the name NoSQL.

One of the NoSQL databases is the key value store. Typical examples include Dynamo, Oracle NoSQL database and Redis. The cache is also a key-value store, so the two are related. Many cache implementations can be configured to be persistent, and often do not do so because caching improves performance rather than persistence. The NoSQL database, in contrast, is used for persistence.

The persistent cache can also be used as a key value NoSQL the database. NoSQL also mentions big data, which typically refers to a larger amount than can be put into a single RDBMS node, typically from a few terabytes to a few petabytes.

Distributed caching is often used to reduce the latency of transactional data, which is not initially large, but is slowly moving into the direction of big data. Because caching saves data in memory, this increases the cost of storage and restricts the size of the data. If you rely on heap storage, each server node may have only a poor 2GB. If you rely on distributed caching, Ehcache also provides a heap of storage, each server can store hundreds of GB of data and can be used as a TB-level cache.

Persistent, distributed caching can be applied to some nosql scenarios. NoSQL database can also deal with some cached scenes, but only a slightly higher latency.

InfoQ: What is the similarity between a distributed cache and a NoSQL database from a schema standpoint?

Greg: They all want to offer TPS and scalability over the RDBMS. To do this, they are all functionally simplified, leaving out the troublesome issues, such as table associations, stored procedures, and acid transactions.

Although there is a JSR 107 in the Java cache realm, it provides a standard set of caching APIs for spring and Java EE programmers, but they tend to use private interfaces more than standardized interfaces.

They are partitioned in a transparent manner to the client, extending outward. Non-Java products are doing a good job of scaling up. With terracotta BigMemory, we also do a very special job of scaling up on the Java platform. Finally, both can be deployed on common hardware and operating systems, making them ideally run in the cloud.

InfoQ: What is the difference between the two technologies in the architecture?

Greg:nosql and RDBMS typically use disks. Disk is a mechanical device, the delay is very severe, because the seek time is the head moved to the correct track time, read and write time depends on the disk RPM. NoSQL attempts to optimize the use of the disk, for example, by simply appending the log to the current position of the head, occasionally refreshing to disk. Instead, caching mostly puts data in memory.

NoSQL and RDBMS have very thin clients (think thrift or JDBC) and only transmit data across the network, while caching like Ehcache uses both in-process and remote storage, so common requests can be successfully processed locally. In a distributed cache context, hotspot data is cached in the in-process storage of each application server, and increasing the number of servers does not increase the load on the network or back-end.

The RDBMS focuses on becoming a general-purpose Sor (System of record). Nosq want to be a sor of certain types of data, such as key-value pairs, documents, sparse tables (wide tables), or graphs. Caching is focused on performance and is typically used in conjunction with an RDBMS or NoSQL database, which is the SOR data type. Often, the result of a Web service invocation is stored in the cache, and the result of the business object's calculation, which may require hundreds of Sor calls.

Parts of the cache, such as Ehcache, run in the operating system process of the application, partly in the process of the network's own machine. But not all of the distributed caches are like this: Memcache is an example where all the data is stored across the network.

InfoQ: Which type of application is best suited to this approach?

Greg: It's also a matter of the previous question, to use distributed caching for your existing applications, which usually only requires a small amount of work, and NoSQL requires a lot of work and a big architectural change.

Therefore, the first type of application that applies to distributed caching is an existing system, in particular with the following requirements:

needs to be scaled out as a result of a surge in usage or load in order to meet SLAs with lower latency to minimize the cost of using expensive infrastructure such as mainframes to minimize the costs of Web service calls to cope with extreme load spikes (like Black Friday promotions)

InfoQ: What are the limitations of this approach?

Greg: Caching, placed in memory, has constraints on size, and their technical limitations are limited by how much memory is available to them (the following details are expanded).

caching, even if it provides the functionality of persistence, may not be counted as a sor. Caching deliberately avoids the complex functionality of backing up to disk and restoring from it, although there are simple. Rdmbs has developed a wealth of backup, restore, migration, reporting, and ETL features over the past 30 years. And NoSQL is somewhere in between.

Caching provides a programming API for changing data and accessing data. NoSQL and RDBMS provide tools to execute scripting languages such as SQL, Unsql, and thrift.

But the key point is to remember that caching does not want to be your sor. It's easy to get along with your RDBMS, so it doesn't require the complex functionality of the RDBMS.

InfoQ: What do you think about the future of distributed caching solutions, NoSQL databases, and traditional RDBMS working together?

Greg: Faster than RDBMS, relies on nosql of deployment topologies, and data access patterns, distributed caching can be anywhere between these three. People who need lower latency can use caching as a supplement to NoSQL, as they do now with RDBMS.

Slightly different, when you want to extend an RDBMS to multiple nodes, it is often difficult to extend, or affect the programming contract, or be subject to

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.