Memory Technology Data Collation

Source: Internet
Author: User
Tags memcached gemfire voltdb

1 Memory Computing and cloud computing

If the new bottle of cloud computing is the wine of virtualization + soa/Grid computing +saas (software as a service), then the memory calculation focuses on releasing the energy from the calculation.


But there are often misconceptions about memory computing:

Ø Large-capacity memory is expensive

Ø memory calculation does not persist : Virtually all memory-computing middleware provides a variety of memory backup, persistent storage backup, and disk-based swap space overflow strategy.

Ø Memory Computing is to replace the Data Warehouse : The purpose of memory calculation is to improve the computation of an operational dataset (operational dataset) that requires an OLTP and OLAP mix, rather than a historical dataset (historical dataset). In short, memory computing is not about putting all the data of an enterprise into memory.

Flash is fast enough : memory is not calculated to achieve a 2-3-fold marginal effect elevation (marginal Effect), but a 10-100-fold boost to make it possible for previously unworkable businesses and services.

Ø Memory is equal to the memory database : First, Memory computing is a technology rather than a product. Second, the memory database is only the current memory calculation within reach of the results, the long-term development of memory computing in the flow-type processing (stream processing). In addition, the difference between memory calculations and traditional memory databases is that memory computing is designed for distributed, resilient environments, and memory processing. 2 Memory Computing Product Categories

According to the development order of memory technology, memory calculation can be divided into three kinds of products roughly:

Ø distributed Caching (MEMCACHED/REDIS): The main use scenario is to save frequently accessed data in memory to avoid disk loading. Most products are distributed Memory Key/value storage and provide simple put and get methods . With the maturing, and back-end read/write-through,acid transactions, replication and partitioning, eviction strategy, etc. are also gradually added to the product, these features have become the later emergence of the IMDG/IMCG product base.

Ø Memory Data/computational grid (IMDG/IMCG, Gemfire/hazelcast/gridgain): The salient feature of the data grid is the co-location calculation , which sends the calculation process to the local execution of the data. This is the key innovation in the data/computing grid, and it is unrealistic to add data to the data crawl to perform the calculation. This innovation also not only makes the memory calculation from the simple cache product evolution, but also stimulated the later IMDB birth.

Ø Distributed Memory Database (IMDB, Voltdb/impala): The salient feature of the distributed memory database is the increased ability of MPP (massively parallel processing) based on standard SQL or MapReduce . If the core of the data grid is to solve the problem of the calculation of the data volume increasing, the distributed memory database is the dilemma to solve the increasing computational complexity. It provides tools for distributed SQL, complex (distributed sharing) indexing, mapreduce processing, and more.


It is noteworthy that with the development of technology, some boundaries are no longer so clear. Many IMDG products, like today, have IMDB features that provide complex distributed SQL and MapReduce computing capabilities. In any case, the core technologies and algorithms in these products are immutable, so you don't have to obsess over how a product belongs to that category.


3 Application Scenarios

The various memory computing products mentioned above can be applied to various aspects of large data processing, as shown in the following illustration, where the technology involved in memory calculation is marked red.

1 transaction processing: mainly divided into cache (Memcached, Redis, GemFire), RDBMS, Newsql (led by the Voltdb) three parts, caching and Newsql database is the focus of attention.

2) Flow processing: Storm itself is only the framework of the calculation, and spark-streaming to achieve the memory calculation of the flow processing.

3 Analysis Phase comparison:

Ø General Treatment: Mapreduce,spark

Ø Enquiries: Hive,pig,spark-shark

Ø Data Mining: MAHOUT,SPARK-MLLIB,SPARK-GRAPHX

It can be seen from the above that the spark of the ecological circle and the Impala are all important points for attention. 4 Core technology

Because memory computing primarily frees up the computational power of computing in the cloud, it mainly involves parallel/distributed computing and memory data management in two major aspects of the technology system:

Ø Parallel/Distributed Computing: Network topology, RPC communication, system synchronization, persistence, log.

Ø Memory Data management: Dictionary encoding, data compression, in-memory data format, data manipulation, memory indexing, memory concurrency control and transactions.

The following is a brief list of the technical points involved in the mainstream Memory computing product:

Ømemcached/redis: Consistent hash.

ØGRIDGAIN:DHT, Refresh-ahead, off-heap, continuous query.

Øinfinispan:lirs Eviction.

Øspark (Lmax): Immutable model (RDD).

Øvoltdb:single-threaded.

Øimpala (Dremel): LLVM optimizing, nested record, MPP.

theory

search for "thesis name pdf" directly in Google or bing.com

"Implementation techniques for Main Memory Database Systems"

"Main Memory Database systems:an Overview"

"The Revolution in Database architecture" –jim Gary

"A Study of Index structures for Main Memory Database Management Systems"

"High-performance concurrency control mechanisms for main-memory Databases"

"A bridging model for Parallel computation" (BSP model)

"Seda:an Architecture for Scalable, well-conditioned Internet Services"

Real-time processing and flow processing

Parallel computation of relational algebra

Concurrent computing model BSP and SEDA

From NSM to parquet: A derived product of a storage structure

Gemfire,voltdb:

Introduction to distributed Cache GemFire architecture

Brief introduction to VOLTDB characteristics of newsql Database

Spark:

Research on Spark distributed computing and RDD model

Current situation and front line of spark development

Distributed memory File System Tachyon

Impala (Dremel):

Google Dremel Data Model detailed (i)

Google Dremel Data Model detailed (next)

Code generation technology in Impala

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.