# Seven weeks seven database
Book Sharing meeting Phase I 2017.02.12
"Seven weeks Seven database"-Eric Redmond
# # PRE-knowledge
ACID: Atomic Consistent isolation persistent
Cap principle: Consistent availability partition fault tolerance, in a distributed environment, at most can only meet the second
"Xiaoming, where's your database job?" "I can submit half of my homework today, or I will submit all my homework tomorrow, but I can't submit all my homework today." ”“... Xiaoming, you get the hell out of Here! ”
Small Knowledge Points:
* Atomicity: atomicity represents the transparency of the intermediate state, and imagine that the read from the a account to the B account is 500 blocks, then {a:1000, b:1000} and {a:500, b:1500} reads are correct, and should not be read to {a:500, b:1000} or {a:1000 , the status of B:1500}
* Consistency: According to the cap principle, sometimes only final consistency is provided in a distributed environment
* Database with four isolation levels, the response to Phantom reading, non-repeatable reading, dirty reading has different degrees of avoidance.
* The database guarantees atomicity through transaction (transactions), usually by writing log first to achieve
* Lock: Divided into read lock and write lock, the minimum lock granularity provided by various databases and engines will be different.
* MVCC: Multi-version control, by filling in additional update time, delete implementation, etc. to ensure the correctness of the data
* Other knowledge points, such as B + book, LSM tree, hash table and hash tier, etc.
# # N*sql History
No SQL Age: Hierarchical database
Know SQL Age: RDBMS
No SQL Age: The CAP principle spawned a series of NoSQL databases that make different tradeoffs and provide different functions
New SQL Age: Whether acid or data relationships are the essence of data, the location of no SQL database shifts from replacing RDBMS to assisting Rdbmd, as well as the creation of time series DB for Big data (such as Riak TS) and new databases with reactive (reactive) characteristics (such as rethink db).
# 7 Database
Rdbms:postgres
Kv:riak Kv,redis
Column database: HBase
Document Type database: Mongo,couch
Graphics database: neo4j
Small Knowledge Points:
* Column-based and row-based differences: Imagine that we have data {id:1, a:1, b:1} and {id:2, a:2, b:1}, the data we store for the column database equals [{id:1, ': '}, ' {a:1, b:1}}, ' {id:2, Data: {a:2, B:2}}], while row-based storage is: {columns: [ID, A, b],data: [1, 1, 1], [2, 2,]}. The design concept of a column database (or schemaless) improves flexibility, but it also sacrifices performance (think about why)
# # Postgres
# # # First day: Relationships, crud and links
# # # Next day: Advanced queries, transactions, stored procedures, views, triggers, rules, pivot tables
Advanced queries include aggregate queries, groupings, and Windows (PARTITION by), etc.
A rule is a description of how to modify a parsed query tree (AST, abstract syntax tree)
# # # Third Day: Full-text Search and multidimensional query
# # Add:
PSQL vs. Mysql:psql has a good code structure to facilitate learning source and plug-in development, so the community has a large number of plug-ins, such as GIS, full-text search, multidimensional space distance, etc., rich stored procedures, support tuple-level streaming replication, and so on.
# Riak KV
# # # First day: CRUD, Links and mime
Riak KV locates a data by the bucket name +key
Riak KV can establish a link between data
Riak KV supports different MIME type data, which can be used for self-built CDN, etc.
# # # Next day: MapReduce and server clusters
Riak kv support includes the method of MapReduce incoming in the request, or it is stored on the server side and is called directly on the request.
Riak kv cluster: Riak ring, which is based on the consistency hash
# # # Third Day: Resolving conflicts and extending Riak KV
Riak kv handles collisions with vector clocks and save multiple versions
Riak KV provides pre/post commit hook
# # Add:
Cap:riak KV based on the view that the CAP can be changed on a per-barrel or per-request basis
# # HBase
HBase out of Hadoop
First day of # # #: Crud and table Management
HBase for Big data, provides single-machine mode, one-node pseudo-distributed mode, distributed mode
HBase provides the concept of a column group
# # # Next day: Working with Big Data
Bloom Filter: A binary filter implemented by bit flipping, whose storage space (infinity) is less than the hash, but there is a probability of pseudo-positive. HBase supports using the Bloom filter to determine whether a column exists on a row.
The usage area of HBase stores different data segments (crosscutting) on different nodes.
# # # Third day: Into the cloud
Thrift
WHIRR: Realize the management of virtual cluster, support EC2 and Rackspace server, etc., support the cluster of Hadoop, HBase, Cassandra, zookeeper and so on.
# # Add:
HBase is the CP on the CAP.
Big data Google has long given up on Hadoop for Spark, and now many projects have started using TS DB as a data storage tool for architecture.
# # Mongo
MONGO from humongous (great) rather than mango.
# # # First day: Crud and nesting
The format of the MONGO storage is Bson
# # # Next day: Indexing, grouping, MapReduce
# # # Third day: Replica sets, shards, GIS, and Gridfs
Sharding Sharding is the MONGO of the cross-cutting scheme, and each sharding establishes a m-s structure.
# # Add:
MONGO has supported the left join OH. This also confirms the historical background of the new SQL that was mentioned earlier. And several recent security incidents ...
# # CouchDB
# # # First day: CRUD, Futon
Futon is the Web management interface provided by Couch.
# # # Next day: View
# # # Third day: Advanced view, Changes API, and replicated data
COUCHDB provides the changes API to help apply monitoring data changes and update immediately (reactive)
Bigcouch is based on Couchdb and implements the Sharding and replication strategy.
# # NEO4J
Neo4j is a "whiteboard-friendly" database
# # # First day: graphs, groovy, and crud
NEO4J data is divided into points and edges, and by adding tag (s) to distinguish different points and edges.
Neo4j is written in Java/scala, which includes a restful (highly deprecated) server mode, cypher (cypher is the standard for a graph DB query language), and embeded Mode of Java and Gremlin (using groovy).
# # # Next day: Rest, indexes and algorithms
Third Day of # # #: Distributed High Availability
NEO4J cluster can be managed with ZK
# # Add:
I personally like graph db, and neo4j, an acid transactional database, and relatively flexible, but there is still a problem with the graph database: The established abstraction may have some impact on the subsequent changes in business requirements, and the core of graph DB itself is to store the link relationship in memory. As a result, many storage tiers use non-graphical storage engines (such as psql) implementations, or use multiple queries rather than joins to solve the problem of n° relationship query time nonlinearity to some extent.
# # Redis
The first day of # # #: Curd and data types
Data types include hashes, lists, collections, ordered collections, scopes, and so on
Expire time can be set, especially for caching
# # # Next day: Advanced usage, distribution
Redis also supports publish-subscribe, so it is also used to make task-queue/message-queue and so on.
The Redis persistence pattern includes snapshots and aof (append only file).
# # # Third day: Working with other databases
# # Add:
The idea of Redis is actually new SQL. In a word, it is praise.
# # Supplement: Time Series DB
TS DB is a immutable database in which historical data cannot be modified once written. For example, 2017.02.12 this day stupid Yan value is a fixed value, whether it is after a day or a week or 10 years later to see, 2017.02.12 this day of the stupid is so handsome slag, this is the concept of immutable. Based on this, the database can read and write disks continuously while the storage is implemented, which greatly improves the performance.
There is also the immutable concept of the block chain blockchain, the difference is that the block chain can be imagined as having thousands of master nodes of the cluster, such implementation in providing a de-centralized while the speed of audit transaction is greatly reduced, So there are a lot of offline transaction means. For the concepts of POW and POS, Ms Merkel-Patricia, EVM (Ethereum virtual machine), and so on, you can get an idea of what's interesting.
# # The last of the last
"I spent a lot of time building a highly available DB cluster!" "Good, xiaoming." How many users are there in our product now? "Count me and your words, two." ”“... Xiaoming, get the hell out of Here! ”
"Seven weeks Seven database" reading sharing