Background architecture design-data storage layer

Source: Internet
Author: User

Data storage Importance:

???? Data is the most important property of the enterprise;

???? The data reliability is the enterprise's Lingen, must guarantee.

?

Single-Machine Storage principle:

???? Storage Engine: The engine of the storage system, which determines the function and performance of the storage system;

???? Engine type: Hash storage engine, B-tree storage engine, LSM storage engine

    1. Hash storage Engine: Based on hash table structure: array + linked list; support create\update\delete\ Random Read
    2. B-Tree storage Engine: Based on the B trees implementation, support a single record of curd, support sequential lookup. RDBMS is used more.
    3. LSM Tree Storage Engine: Modify the data incrementally save in memory, achieve certain conditions and then batch update to disk; The advantage lies in the bulk writing; The disadvantage is that the reading requires merging disk and memory;
      1. Avoid memory data loss: The Modify operation is written to the Commitlog log.

Data Model:

    1. Files: Organized in a directory tree, such as linux,mac,windows;
    2. Relational: Each relationship is a table, multi-line composed of multiple columns per row;
    3. Key value (key-value): Memcached, Tokey, Redis;
    4. Column storage type: Casadra, Hbase;
    5. Graphics database: neo4j, Infogrid, Infinite graph
    6. Document type: MongoDB, CouchDB

Transaction and concurrency control:

???? Transaction 4 Basic properties: ACID atomicity, consistency, isolation, persistence

???? ???? Concurrency control:

???????????? Lock particle Size: process->db->table->row

???????????????? Read concurrency is provided, read is unlocked: Copy on Write, MVCC

???????? Data recovery: by Operation Log

????

Multi-Machine Storage principle:

???? Stand-alone storage principle is still available in multi-machine storage, multi-level storage is based on stand-alone storage;

???? Data distribution:

???????? Distributed in multiple nodes, load balancing between nodes;

???????? Distribution mode:

???????????? Static: Take the mold, uid%32;

???????????? Dynamic: Consistent hash, data drift problem (a node before the update failure, update migration to b node after a node and recovery);

???????? Copy:

???????????? Distributed storage of multiple replicas, guaranteed high reliability and high availability; Commit Log.

???????? Fault Detection:

???????????? Heartbeat mechanism, data migration, failure recovery;

?

FLP theorem and Design:

???? FLP impossiblity (FLP not possible):

???????? In asynchronous message communication scenarios, even if only one process fails, there is no way to guarantee that non-failed processes will achieve consistency.

Cap theorem and design:

???? CAP: Consistency (consistency), availability (availabilty), partition tolerance (tolerance of network Partition).

???? Consistency and availability require a compromise tradeoff

???? Distributed storage systems need to be capable of automatic fault tolerance, that is, zoning tolerance needs to be guaranteed.

2PC (Phase Commit) protocol and Design:

???? for distributed transactions;

???? The two types of nodes consist of:

???????? Facilitator (1);

???????? Transaction participants (multiple);

???? Divided into two stages:

???????? Request phase: The Facilitator notifies the participant to prepare to submit or cancel the transaction, and all participants are required to vote or disagree.

???????? Submission phase:

After receiving all the participants ' decision, the facilitator makes the decision (submit or Cancel);

Notify participants to perform actions, all participants agree to submit, otherwise cancel;

The participant receives a notification from the Coordinator to perform the action.

???? The 2PC protocol is a blocking type:

???????? Transaction contributor may fail

???????????? --Set the timeout period;

???????? The protocol may fail

???????????? --Log records, alternate coordinators

???? Application: Trading orders, etc.;

?

Paxos protocol and Design:

???? Role:

???????? Solve the problem of consistency between nodes;

???????? If the primary node is down, select the new node;

???????? The primary node often synchronizes the nodes in the form of operations logs.

???? Divided into two roles: the Proposer (Prpposer), the recipient (acceptor);

???? To perform the steps:

    1. Approval: Proposer sends an accept message to Accepter to request acceptance of a proposal;
    2. Confirmation: The Accepter accept, then the proposed value is effective, proposer send acknowledge message to notify all Accepter proposals to take effect.

???? Comparison with 2PC:

The 2PC protocol guarantees the atomicity of operations on multiple data shards;

???????? The Paxos protocol guarantees data consistency between multiple copies of a data shard;

???? Paxos protocol Usage:

???????? Implement global lock service or naming and configuration services;

???????????? ---Apache Zookeeper

???????? Copy user data to multiple data centers;

???????????? ---Google megastore

?

Data storage Layer Redundancy:

???? Multiple replicas for high availability of access.

???? How to achieve:

???????? Data replication:

???????????? Log based;

???????????? Master-slave:mysql\mongodb

???????????? Replic Set:mongodb

???????? Double write:

???????????? Storage layer Multi-master equivalent structure, more flexible, but the cost of the data module layer is high;

???? Data backup:

???????? Cold backup:

???????????? Regular data copying to a storage medium is a traditional means of data protection;

???????????? Advantages: Simple, inexpensive, low technical difficulty;

???????????? Cons: Inconsistent data on a regular basis, long time to recover data;

???????? Hot backup:

???????????? Online backup, providing better high availability;

???????????? Asynchronous Hot Backup:

???????????????? Writes from the primary store are returned to the application side, and the storage system writes to the other replicas asynchronously;

???????????? Synchronous Hot Backup:

???????????????? Multiple copies of data is completed synchronously, without master and slave;

???????????????? To improve performance, the application writes concurrently;

???????????????? The response delay is the slowest server;

?

Data storage layer Failover mechanism:

???? Failure confirmation: Whether the downtime, heartbeat;

???? Access transfer: Access routes to non-downtime machines; storage data is exactly the same;

???? Data recovery: Master and slave, log;

Background architecture design-data storage layer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.