"Seven weeks Seven database" reading sharing

Source: Internet
Author: User
Tags psql riak neo4j couchdb

# Seven weeks seven database

Book Sharing meeting Phase I 2017.02.12

"Seven weeks Seven database"-Eric Redmond

# # PRE-knowledge

ACID: Atomic Consistent isolation persistent

Cap principle: Consistent availability partition fault tolerance, in a distributed environment, at most can only meet the second

"Xiaoming, where's your database job?" "I can submit half of my homework today, or I will submit all my homework tomorrow, but I can't submit all my homework today." ”“... Xiaoming, you get the hell out of Here! ”

Small Knowledge Points:

* Atomicity: atomicity represents the transparency of the intermediate state, and imagine that the read from the a account to the B account is 500 blocks, then {a:1000, b:1000} and {a:500, b:1500} reads are correct, and should not be read to {a:500, b:1000} or {a:1000 , the status of B:1500}
* Consistency: According to the cap principle, sometimes only final consistency is provided in a distributed environment
* Database with four isolation levels, the response to Phantom reading, non-repeatable reading, dirty reading has different degrees of avoidance.
* The database guarantees atomicity through transaction (transactions), usually by writing log first to achieve
* Lock: Divided into read lock and write lock, the minimum lock granularity provided by various databases and engines will be different.
* MVCC: Multi-version control, by filling in additional update time, delete implementation, etc. to ensure the correctness of the data
* Other knowledge points, such as B + book, LSM tree, hash table and hash tier, etc.

# # N*sql History

No SQL Age: Hierarchical database

Know SQL Age: RDBMS

No SQL Age: The CAP principle spawned a series of NoSQL databases that make different tradeoffs and provide different functions

New SQL Age: Whether acid or data relationships are the essence of data, the location of no SQL database shifts from replacing RDBMS to assisting Rdbmd, as well as the creation of time series DB for Big data (such as Riak TS) and new databases with reactive (reactive) characteristics (such as rethink db).

# 7 Database

Rdbms:postgres

Kv:riak Kv,redis

Column database: HBase

Document Type database: Mongo,couch

Graphics database: neo4j

Small Knowledge Points:

* Column-based and row-based differences: Imagine that we have data {id:1, a:1, b:1} and {id:2, a:2, b:1}, the data we store for the column database equals [{id:1, ': '}, ' {a:1, b:1}}, ' {id:2, Data: {a:2, B:2}}], while row-based storage is: {columns: [ID, A, b],data: [1, 1, 1], [2, 2,]}. The design concept of a column database (or schemaless) improves flexibility, but it also sacrifices performance (think about why)

# # Postgres

# # # First day: Relationships, crud and links

# # # Next day: Advanced queries, transactions, stored procedures, views, triggers, rules, pivot tables

Advanced queries include aggregate queries, groupings, and Windows (PARTITION by), etc.

A rule is a description of how to modify a parsed query tree (AST, abstract syntax tree)

# # # Third Day: Full-text Search and multidimensional query

# # Add:

PSQL vs. Mysql:psql has a good code structure to facilitate learning source and plug-in development, so the community has a large number of plug-ins, such as GIS, full-text search, multidimensional space distance, etc., rich stored procedures, support tuple-level streaming replication, and so on.

# Riak KV

# # # First day: CRUD, Links and mime

Riak KV locates a data by the bucket name +key

Riak KV can establish a link between data

Riak KV supports different MIME type data, which can be used for self-built CDN, etc.

# # # Next day: MapReduce and server clusters

Riak kv support includes the method of MapReduce incoming in the request, or it is stored on the server side and is called directly on the request.

Riak kv cluster: Riak ring, which is based on the consistency hash

# # # Third Day: Resolving conflicts and extending Riak KV

Riak kv handles collisions with vector clocks and save multiple versions

Riak KV provides pre/post commit hook

# # Add:

Cap:riak KV based on the view that the CAP can be changed on a per-barrel or per-request basis

# # HBase

HBase out of Hadoop

First day of # # #: Crud and table Management

HBase for Big data, provides single-machine mode, one-node pseudo-distributed mode, distributed mode

HBase provides the concept of a column group

# # # Next day: Working with Big Data

Bloom Filter: A binary filter implemented by bit flipping, whose storage space (infinity) is less than the hash, but there is a probability of pseudo-positive. HBase supports using the Bloom filter to determine whether a column exists on a row.

The usage area of HBase stores different data segments (crosscutting) on different nodes.

# # # Third day: Into the cloud

Thrift

WHIRR: Realize the management of virtual cluster, support EC2 and Rackspace server, etc., support the cluster of Hadoop, HBase, Cassandra, zookeeper and so on.

# # Add:
HBase is the CP on the CAP.

Big data Google has long given up on Hadoop for Spark, and now many projects have started using TS DB as a data storage tool for architecture.

# # Mongo

MONGO from humongous (great) rather than mango.

# # # First day: Crud and nesting

The format of the MONGO storage is Bson

# # # Next day: Indexing, grouping, MapReduce

# # # Third day: Replica sets, shards, GIS, and Gridfs

Sharding Sharding is the MONGO of the cross-cutting scheme, and each sharding establishes a m-s structure.

# # Add:

MONGO has supported the left join OH. This also confirms the historical background of the new SQL that was mentioned earlier. And several recent security incidents ...

# # CouchDB

# # # First day: CRUD, Futon

Futon is the Web management interface provided by Couch.

# # # Next day: View

# # # Third day: Advanced view, Changes API, and replicated data
COUCHDB provides the changes API to help apply monitoring data changes and update immediately (reactive)

Bigcouch is based on Couchdb and implements the Sharding and replication strategy.

# # NEO4J

Neo4j is a "whiteboard-friendly" database

# # # First day: graphs, groovy, and crud
NEO4J data is divided into points and edges, and by adding tag (s) to distinguish different points and edges.

Neo4j is written in Java/scala, which includes a restful (highly deprecated) server mode, cypher (cypher is the standard for a graph DB query language), and embeded Mode of Java and Gremlin (using groovy).

# # # Next day: Rest, indexes and algorithms

Third Day of # # #: Distributed High Availability
NEO4J cluster can be managed with ZK

# # Add:
I personally like graph db, and neo4j, an acid transactional database, and relatively flexible, but there is still a problem with the graph database: The established abstraction may have some impact on the subsequent changes in business requirements, and the core of graph DB itself is to store the link relationship in memory. As a result, many storage tiers use non-graphical storage engines (such as psql) implementations, or use multiple queries rather than joins to solve the problem of n° relationship query time nonlinearity to some extent.

# # Redis

The first day of # # #: Curd and data types

Data types include hashes, lists, collections, ordered collections, scopes, and so on

Expire time can be set, especially for caching

# # # Next day: Advanced usage, distribution

Redis also supports publish-subscribe, so it is also used to make task-queue/message-queue and so on.

The Redis persistence pattern includes snapshots and aof (append only file).

# # # Third day: Working with other databases

# # Add:
The idea of Redis is actually new SQL. In a word, it is praise.

# # Supplement: Time Series DB
TS DB is a immutable database in which historical data cannot be modified once written. For example, 2017.02.12 this day stupid Yan value is a fixed value, whether it is after a day or a week or 10 years later to see, 2017.02.12 this day of the stupid is so handsome slag, this is the concept of immutable. Based on this, the database can read and write disks continuously while the storage is implemented, which greatly improves the performance.

There is also the immutable concept of the block chain blockchain, the difference is that the block chain can be imagined as having thousands of master nodes of the cluster, such implementation in providing a de-centralized while the speed of audit transaction is greatly reduced, So there are a lot of offline transaction means. For the concepts of POW and POS, Ms Merkel-Patricia, EVM (Ethereum virtual machine), and so on, you can get an idea of what's interesting.

# # The last of the last

"I spent a lot of time building a highly available DB cluster!" "Good, xiaoming." How many users are there in our product now? "Count me and your words, two." ”“... Xiaoming, get the hell out of Here! ”

"Seven weeks Seven database" reading sharing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.