Let's take a look at a panorama of technology, in which the technology that involves memory computing is labeled red.
1 transaction processing: mainly divided into caches (Memcached, Redis, GemFire),RDBMS,newsql( voltdb The three -part, Cache and newsql databases are the focus of attention.
2 streaming:Storm itself is only the framework of computation, and spark-streaming realizes the flow processing of the memory computation.
3 ) Comparison of the analysis phase:
? General processing:MapReduce,Spark
? query:Hive,Pig,spark-shark
? Data mining:Mahout,spark-mllib,spark-graphx
As you can see, theSpark Ecosystem's sub-projects, as well as Impala , are a noteworthy focus.
theory
Search " paper name PDF" directly from Google or bing.com
"implementation techniques for Main Memory Database Systems"
"Main Memory Database systems:an Overview"
"TheRevolution in Database Architecture" –jim Gary
"A Study of Index structures for Main Memory Database Management Systems"
"high-performance Concurrency Control mechanisms for main-memory Databases"
"A bridging model for parallel computation" (BSP model)
"Seda:an Architecture for scalable, well-conditioned Internet Services"
Real-time processing and stream processing
Parallel computation of relational algebra
Concurrent computing Model BSP and SEDA
From NSM to parquet: The derivation of the storage structure
Product
GemFire ,voltdb:
Introduction to Distributed Cache GemFire architecture
Newsql Introduction to Database Voltdb features
Spark :
Spark Research on distributed computing and RDD model
Spark development status and fronts
Distributed memory File system Tachyon
Impala (Dremel) :
Google Dremel Detailed Data Model ( top )
Google Dremel Detailed Data Model ( below )
Impala The code generation technology in
Data collation of memory technology