Data storage Importance:
???? Data is the most important property of the enterprise;
???? The data reliability is the enterprise's Lingen, must guarantee.
?
Single-Machine Storage principle:
???? Storage Engine: The engine of the storage system, which determines the function and performance of the storage system;
???? Engine type: Hash storage engine, B-tree storage engine, LSM storage engine
- Hash storage Engine: Based on hash table structure: array + linked list; support create\update\delete\ Random Read
- B-Tree storage Engine: Based on the B trees implementation, support a single record of curd, support sequential lookup. RDBMS is used more.
- LSM Tree Storage Engine: Modify the data incrementally save in memory, achieve certain conditions and then batch update to disk; The advantage lies in the bulk writing; The disadvantage is that the reading requires merging disk and memory;
- Avoid memory data loss: The Modify operation is written to the Commitlog log.
Data Model:
- Files: Organized in a directory tree, such as linux,mac,windows;
- Relational: Each relationship is a table, multi-line composed of multiple columns per row;
- Key value (key-value): Memcached, Tokey, Redis;
- Column storage type: Casadra, Hbase;
- Graphics database: neo4j, Infogrid, Infinite graph
- Document type: MongoDB, CouchDB
Transaction and concurrency control:
???? Transaction 4 Basic properties: ACID atomicity, consistency, isolation, persistence
???? ???? Concurrency control:
???????????? Lock particle Size: process->db->table->row
???????????????? Read concurrency is provided, read is unlocked: Copy on Write, MVCC
???????? Data recovery: by Operation Log
????
Multi-Machine Storage principle:
???? Stand-alone storage principle is still available in multi-machine storage, multi-level storage is based on stand-alone storage;
???? Data distribution:
???????? Distributed in multiple nodes, load balancing between nodes;
???????? Distribution mode:
???????????? Static: Take the mold, uid%32;
???????????? Dynamic: Consistent hash, data drift problem (a node before the update failure, update migration to b node after a node and recovery);
???????? Copy:
???????????? Distributed storage of multiple replicas, guaranteed high reliability and high availability; Commit Log.
???????? Fault Detection:
???????????? Heartbeat mechanism, data migration, failure recovery;
?
FLP theorem and Design:
???? FLP impossiblity (FLP not possible):
???????? In asynchronous message communication scenarios, even if only one process fails, there is no way to guarantee that non-failed processes will achieve consistency.
Cap theorem and design:
???? CAP: Consistency (consistency), availability (availabilty), partition tolerance (tolerance of network Partition).
???? Consistency and availability require a compromise tradeoff
???? Distributed storage systems need to be capable of automatic fault tolerance, that is, zoning tolerance needs to be guaranteed.
2PC (Phase Commit) protocol and Design:
???? for distributed transactions;
???? The two types of nodes consist of:
???????? Facilitator (1);
???????? Transaction participants (multiple);
???? Divided into two stages:
???????? Request phase: The Facilitator notifies the participant to prepare to submit or cancel the transaction, and all participants are required to vote or disagree.
???????? Submission phase:
After receiving all the participants ' decision, the facilitator makes the decision (submit or Cancel);
Notify participants to perform actions, all participants agree to submit, otherwise cancel;
The participant receives a notification from the Coordinator to perform the action.
???? The 2PC protocol is a blocking type:
???????? Transaction contributor may fail
???????????? --Set the timeout period;
???????? The protocol may fail
???????????? --Log records, alternate coordinators
???? Application: Trading orders, etc.;
?
Paxos protocol and Design:
???? Role:
???????? Solve the problem of consistency between nodes;
???????? If the primary node is down, select the new node;
???????? The primary node often synchronizes the nodes in the form of operations logs.
???? Divided into two roles: the Proposer (Prpposer), the recipient (acceptor);
???? To perform the steps:
- Approval: Proposer sends an accept message to Accepter to request acceptance of a proposal;
- Confirmation: The Accepter accept, then the proposed value is effective, proposer send acknowledge message to notify all Accepter proposals to take effect.
???? Comparison with 2PC:
The 2PC protocol guarantees the atomicity of operations on multiple data shards;
???????? The Paxos protocol guarantees data consistency between multiple copies of a data shard;
???? Paxos protocol Usage:
???????? Implement global lock service or naming and configuration services;
???????????? ---Apache Zookeeper
???????? Copy user data to multiple data centers;
???????????? ---Google megastore
?
Data storage Layer Redundancy:
???? Multiple replicas for high availability of access.
???? How to achieve:
???????? Data replication:
???????????? Log based;
???????????? Master-slave:mysql\mongodb
???????????? Replic Set:mongodb
???????? Double write:
???????????? Storage layer Multi-master equivalent structure, more flexible, but the cost of the data module layer is high;
???? Data backup:
???????? Cold backup:
???????????? Regular data copying to a storage medium is a traditional means of data protection;
???????????? Advantages: Simple, inexpensive, low technical difficulty;
???????????? Cons: Inconsistent data on a regular basis, long time to recover data;
???????? Hot backup:
???????????? Online backup, providing better high availability;
???????????? Asynchronous Hot Backup:
???????????????? Writes from the primary store are returned to the application side, and the storage system writes to the other replicas asynchronously;
???????????? Synchronous Hot Backup:
???????????????? Multiple copies of data is completed synchronously, without master and slave;
???????????????? To improve performance, the application writes concurrently;
???????????????? The response delay is the slowest server;
?
Data storage layer Failover mechanism:
???? Failure confirmation: Whether the downtime, heartbeat;
???? Access transfer: Access routes to non-downtime machines; storage data is exactly the same;
???? Data recovery: Master and slave, log;
Background architecture design-data storage layer