Cloud computing Reading Notes (2)

Source: Internet
Author: User

Principles and Applications of Google cloud computing

Google cloud computing services include: Google File System GFS, distributed computing programming model mapreduce, distributed lock service chubby, distributed structured data table bigtable, distributed storage system mongostore, and distributed monitoring system dapper.

GFS provides massive data storage and access capabilities.

GFS

System Architecture:

There are three types of roles: client, master, and Chunk server)

1. The central server module can be used to add any chunk server.

2. Do not implement caching. This is based on the necessity and feasibility.

Necessity: Most of the client is stream read/write, and there is no large number of repeated reads/writes.

Feasibility: It is extremely complicated to maintain consistency between the cache and actual data. Coupled with uncertainties such as the network, the consistency problem is particularly complicated. In addition, the data volume is very large and cannot be cached based on the current memory capacity.

The data stored in the master node of GFS is cached.

3. A normal file system is an important part of the operating system. The file system can be better integrated with the operating system itself in kernel mode.

However, gfs is implemented in user mode based on the following considerations:

1) You can use the POSIX programming interface provided by the operating system in user mode to expand data access without having to know the internal implementation interface.

2) POSIX interfaces provide richer functions and are not restricted by kernel programming.

3) There are multiple debugging tools in user mode

4) In user mode, both the master and Chunk Server Run in the process mode. A single process does not affect the entire operating system.

5) in the user mode, gfs and the operating system run in different spaces, reducing coupling between the two, facilitating GFS expansion and upgrade.

4. Only proprietary interfaces are provided.

Fault Tolerance Mechanism:

1. Master Fault Tolerance

1) The namespace is the directory structure of the entire file system.

2) ing table between Chunk and file name

3) Location Information of the chunk copy. Each chunk has three copies by default.

2. Chunk server fault tolerance

GFS uses copies to implement fault tolerance for chunk servers. By default, each copy stores three

The default size of each chunk divided by GFS is 64 MB.

System Management Technology:

1) Large-scale cluster Installation

2) Fault Detection

3) dynamically add nodes

4) Energy Saving

Stored data processing mapreduce

Mapreduce is the concept and main idea of "ing" and "simplification.

For example, to query the number of times each word appears in a large text, after map processing, a batch of intermediate results is formed <word, number of occurrences>, while the reduce function processes intermediate results, accumulate the number of occurrences of the same word to obtain the number of occurrences of each word.

Shard lock Service

Chubby is a file system designed by Google to provide coarse-grained services. It is a loosely coupled distributed system.

By using the chubby lock service, you can ensure consistency during data operations.

1. paxos Algorithm

Paxos is a message transfer-based consistency algorithm used to solve consistency problems in distributed systems.

How can we solve the consistency problem in the distributed architecture? The simplest thing is to set a node. All operations go through this node to ensure the uniqueness of the node.

But this disadvantage is also obvious, that is, if the node fails, there will be confusion, so you need to set multiple such nodes in the system.

Paxos algorithms are divided into three types: proposers, acceptors, and learners. Proposers proposes a resolution, acceptors approves the resolution, and learners obtains and uses the approved resolution.

2. Chubby System Design

The main objectives of Chubby are as follows:

1) high availability and high reliability

2) high scalability

3) support for coarse-grained lock Creation

4) Direct storage of Service Information

5) support for notification mechanism

6) Support for caching

Partitioned structured data table bigtable

Bigtable is a Distributed Storage System Based on GFS and chubby on Google.

Bigtable is similar to a database in many ways.

Data Model:

Bigtable is a distributed multi-dimensional ing table. Data in a table is indexed by a row keyword, a column keyword, and a timestamp. Bigtable does not parse the data stored in it and is considered as a string.

1) rows

It can be any string, but the size cannot exceed 64 K. The sorting is based on the row keyword. We recommend that you use the Lexicographic Order.

2) Columns

It has the concept of column family. The family name must be meaningful and can be selected randomly. The same family is compressed and stored together.

It is also the basic unit of access control in bigtable.

3) Timestamp

The default value is a 64-bit integer.

Currently, two types of settings are provided. One is to retain the last n different versions, and the other is to retain all different versions within a specified period of time.

System Architecture:

Bigtable consists of three parts: client library, master server, and table server)

When the client accesses the bigtable service, it first uses the function library to open a lock. After the lock is opened, the client can communicate with the sub-Table server.

Role of the master server:

1) allocate a new sub-table

2) Sub-Table Server Status Monitoring

3) Load Balancing between sub-servers

Sub-Table Server:

1) Data in sstable is divided into blocks. The size of each block can be set. Generally, it is 64 KB. There is an index at the end of sstable ), when sstable is enabled, this index will be loaded into the memory, so the search speed will be very fast.

Each sub-table is composed of multiple sstables and logs.

2) The subtable address is a layer-3 query system similar to the B + tree in the bigtable system.

First check the root sub-table, then find the metadata sub-table, and finally find the corresponding user table

3) data storage and read/write operations for sub-tables

Performance Optimization

1) Local Group

2) Compression

3) bloom Filter

Shard store

External Store: Perfect Combination of relational databases and nosql

Design goals:

1) For availability: The paxos algorithm is introduced.

2) Scalability: Data partitions are used to store each partition in nosql.

External Store Data Model

You can query data in a way similar to SQL, and have a set of corresponding query languages.

Replica store's Core Technology

1) Copied logs

2) Data Reading (1) local query (2) discovery location (3) Catch Up (4) Verification (5) query data

3) Data Writing (1) accept leader (2) Prepare (3) accept (4) fail (5) take effect

Basic monitoring framework of the storage-based system dapper

Distributed Monitoring System

Basic design objectives:

1) low overhead

2) transparent to the Application Layer

3) scalability

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.