Analysis of large data solution based on Microsoft SQL Server Parallel-Warehouse

Source: Internet
Author: User
Tags microsoft sql server relational database table

Review

As more and more organizations of data from the GB, TB level to the PB level, marking the entire social informatization level is entering a new era-the big data era. The processing of massive data, analytical ability, increasingly becoming the key factor in the future of the organization in this era, and based on the application of large data, but also imperceptibly penetrate into all aspects of society, affecting everyone's daily life, people see the daily life of television, browsing the web, received ads, will be based on a large data analysis provided by the targeted content.

Microsoft's strategic focus in the Big data field is to better help customers "consume" large data so that all users can gain insights into business execution from virtually any type of data of any size. Based on this strategy, Microsoft has released a new generation of parallel Data Warehouse All-in-one SQL Server Parallel data Warehouse (PDW), providing large-scale parallel processing and flexible linear scaling capabilities of the Data Warehouse platform, its main features are mainly embodied in the following 3 aspects:

Built for large data: Through the polybase of this data processing breakthrough technology Unified query structured, semi-structured and unstructured data, to help users use the most familiar standard SQL language can easily implement the Hadoop table and relational database table Association query. At the same time, because most commonly used business intelligence analysis tools can not directly query Hadoop, and Polybase technology through the database platform for the integration of Hadoop, so that users can use the familiar existing business intelligence tools to achieve large data flexible analysis and presentation. For example, users can use familiar Microsoft Excel to analyze structured and unstructured data in the same table.

Next-generation performance and scale: A scalable, xvelocity clustered storage technology that enables performance improvements up to 50 times times. Based on large-scale parallel processing engine technology, it provides linear horizontal scaling capability from several terabytes to PB-level data.

Optimized software and hardware value: SQL Server Parallel Data Warehouse integrated with pre-installed hardware and software, integrates the current Microsoft's newest generation of software innovation technology such as xvelocity column storage, polybase, Windows Server-Hyper-V virtualization technology, Storage spaces storage technology drives streamlined and efficient hardware architectures to provide cost-effective advantages.

This article delves into the Polybase technology for SQL Server Parallel Data Warehouse and explains how Polybase technology provides business people with a large, easy-to-use data solution with a specific business scenario example.

Polybase Technology

Generally speaking, Polybase technology includes the following specific features:

Use an external table to define the structure of the data in Hadoop.

Implement a query to Hadoop data by running an SQL statement.

The integration of Hadoop and PDW data can be realized by PDW to query the tables in Hadoop data and relational database PDW.

It is easy to import Hadoop data into PDW by running SQL commands to query Hadoop and saving the result set to a PDW table.

Hadoop can also serve as an online data archiving system for PDW, by running simple SQL commands to export data from PDW to Hadoop, and to PDW online queries on the data archived in Hadoop.

Here's an example to further illustrate the scenario and usage of the above polybase technology, and in this example we will assist in the allocation of resources to U.S. states by analyzing the data that are associated with Hurricane Sandy in the United States.

First, you can create a table [dbo] in a PDW relational database. [Nws_ffg7], which is stored in data from the United States Oceanic and Meteorological Administration National Oceanic and Atmospheric Administration (NOAA). As with the SQL Server 2012 experience, we can connect PDW through the standard SQL Server Data Tools tool, as shown in the following figure. Based on [dbo]. The [Nws_ffg7] table can create a view flashflood as SELECT * FROM [dbo]. [Nws_ffg7]. The result set that is returned by querying the Flashflood view is that the table contains the names of each state in the United States, geographic attribute information such as longitude, latitude, and rainfall forecast information for various states over the next several time periods, such as the next 1 hours (HR1 columns), 3 hours (HR3 columns), 6 hours (HR6 column) and so on.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.