Azure SQL Database warehouse Data Warehouse (2) schema

Source: Internet
Author: User

Windows Azure Platform Family of articles Catalog

  

In the previous article, I introduced the basic content of the MPP architecture

In this chapter, I introduce you to the architecture of Azure SQL Data Warehouse (SQL DW).

 

  The 1.SQL DW is divided into head node and work node, denoted by control node and compute node

  

SQL DW uses multiple work node scale-out methods to support a large number of PB-level relational data.

The application sends the T-SQL command to head Node. Head node uses the MPP engine, which optimizes queries for parallel processing and then sends queries to work node for parallel queries

Work node saves data that needs to be processed in azure storage and queries in parallel

Data Mobility Service (DMS) is an internal service for SQL DW that can move data across nodes as needed to run queries in parallel and return accurate results

2.SQL DW is compute and storage separation

The user's data is saved in Azure Storage and is not saved on the local disk of work node.

SQL DW implements the logical dependency of work node and user data, and the data does not move

That is, when the user's work node is scaled from 6 to 10, the SQL DW re-sets the logical relationship of 10 work node to azure storage

3.Azure Storage

SQL DW is used at the bottom of Azure Premium Storage, which is SSD storage.

Users can also set the distribution mode of database tables on SSD storage. The distribution patterns of data tables supported by SQL DW are:

(1) Polling Round Robin

(2) Hashing hash

(3) Copy Replication

  4.Head Node

Head node is the brain of SQL DW. Head node is at the forefront of processing application interactions and links.

The MPP engine on head node optimizes and coordinates parallel queries.

When a user submits a T-SQL query, Head node transforms the query into parallel queries on all work node

  5.Work Node

The work node is the one that really makes the calculation. The number range for the work node node is 1-60.

Each work node has a node ID that is visible in the system view. We can view the compute node ID by looking for the node_id column in the system view that begins with the sys.pdw_nodes name

  

  6. Data partitioning mode

Partitioning is the basic storage and processing unit for parallel queries that run on distributed data.

When SQL DW queries, the task is divided into 60 sub-queries that are executed in parallel. Each subquery runs on a 1 data distribution.

By default, the user's data is divided into 60 parts (60 partitions).

  When the number of work node nodes is 1, 1 jobs node processes 60 data.

When the number of work node nodes is 2, 2 jobs node processes 60 data, each work node processes 30 data

When the number of work node nodes is 3, 3 jobs node processes 60 data, each work node processes 20 data

......

When the number of working node nodes is 60,60 work node processes 60 data, and each work node processes 1 data, which is the highest degree of parallelism

  For example, suppose we have only 1 databases, this database has only 1 tables, and this table has 60 million rows of data

When the number of working node nodes is 1, 1 work node processes 60 million rows of data, which is the lowest degree of parallelism

When the working node number is 2, 2 work node processes 60 million rows of data, and each work node processes 30 million rows of data

When the working node number is 3, 3 work node processes 60 million rows of data, and each work node processes 20 million rows of data

......

When the number of working node nodes is 60, 60 work node processes 60 million rows of data, and each work node processes 1 million rows of data, which is the highest degree of parallelism

Observation can be found that setting the SQL DW partition mode is very important

As we have described above, there are three types of partitioning patterns supported by SQL DW: Polling Round Robin, hashing hash, copying Replication

  

 7. Rotation Round Robin

Polling tables are the simplest distribution pattern, typically used for temporary tables

The data in the polling table is evenly distributed and does not perform any optimizations. When using the rotation table, SQL DW randomly selects a distribution key and then randomly writes the data to the polling table

Unlike hash distribution tables, rows with equal values are not necessarily assigned to the same distributions.

 8. Hashing Hash

Hash tables provide table join (join) and aggregate queries (aggregation) for large database tables, providing the highest performance

When using the hash table, SQL DW uses the hash function to assign each row to the same partition.

In a database table, define one of the columns in the column, and use the hash function to save the data to the same partition

Description of the hash Table

  

  9. Copy Replica

Replication tables provide the fastest query performance for tables with small amounts of data

The Copy table caches a complete copy of the table on each work node. Therefore, when using a replicated table for table join (join) and aggregate queries (aggregation), no data is generated to move on work node

Copying tables is best used for tables with small amounts of data, which is better when the table size is less than 2GB

Describes the replication replica table

  

  

Azure SQL Database warehouse Data Warehouse (2) schema

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.