SQL Server Parallel Data Warehouse (PDW) 介紹

來源:互聯網
上載者:User

最近大資料概念非常火熱,各個廠家都講大資料視為未來IT的一個重要方向,因此各個廠家都想在這個領域有所作為。前幾天參加了IBM大資料研討會,會上IBM推出了他們針對於大資料的解決方案,三種一體機(PureSystem,另外IBM在推出了DB2 v10,為了打Oracle RAC專門設計的PureScale正式加入了DB2大版本中)。

 

在MPP架構方面,以前微軟是被詬病的,缺乏產品應對大資料的挑戰。之後從網上查了一下發現微軟從2008 R2之後也發布了MPP資料倉儲架構,並且在今年會推出自己的一體機。

 

針對於SQL Server PDW機構,有一篇文章講的非常詳細,大家有興趣可以看一下:(文章來源於:http://www.jamesserra.com/archive/2011/08/microsoft-sql-server-parallel-data-warehouse-pdw-explained/)

 

Microsoft SQL Server Parallel Data Warehouse (PDW), formally called by its code name “Project Madison”, is an edition of Microsoft’s SQL Server 2008 R2 that was released in December 2010.  PDW is Microsoft’s reworking of the
DatAllegro Inc. massive parallel processing (MPP) product that Microsoft
acquired in July 2008.  It only works with certain hardware (two so far), the first of which is

HP Enterprise Data Warehouse Appliance (Dell is the other, with a couple more to come in the near future: IBM and Bull).  This edition of SQL Server can’t
be bought as an independent piece of software, it has to be bought along with the hardware.

So what is MPP?  Until now, SQL Server has been a Symmetric Multiprocessing (SMP) solution, which essentially means it uses one server.  MPP provides scalability and query performance by running independent servers in parallel.  That is the quick definition. 
For more details, read
What MPP means to SQL Server Parallel Data Warehouse.

MPP is also available from other companies such as
EMC Greenplum, Teradata,
Oracle Exadata, HP Vertica, and
IBM Netezza, but those use proprietary systems, where PDW can be used with commodity hardware, providing a much lower cost per terabyte.  But it’s still not in-expensive: The hardware and installation will cost around $2
million (not including software licenses), but gets you “200 times faster queries and 10 times the scalable than traditional Microsoft SQL Server deployments” (see

press release).  PDW also comes with its own support model.

Microsoft has had clustering capabilities in SQL Server for a while, but the scalability part was lacking.  This is where PDW comes in.  Scalability in PDW means handling tens of terabytes of data and then moving to hundreds of terabytes worth (up to 600
TB).  At about 50 terabytes to 60 terabytes of data, clustering is needed; thereafter, clustering starts to approach its limits, and that is when you need to move to PDW.  Clustering brings concurrency to the system and reduces load, but it can’t reduce the
time that a single query would take without any resource latency. To break this barrier, parallelism would be required to execute bits of the same request simultaneously and this is what exactly this setup would bring to the table.  PDW partitions large tables
across multiple physical nodes, each having its own dedicated CPU, memory, storage, and each running its own instance of SQL Server in a parallel shared nothing design.  Tables can either be replicated, where a copy will be on each node (usually for dimension
tables), or distributed, where portions of a table are uniformly distributed across all nodes (usually for fact tables).

One drawback to PDW is that it does not use SQL Server Management Studio, but uses a third-party tool called
Nexus Chameleon (this third-party tool is needed because SSMS hasn’t been reworked to connect directly to the control node of the Parallel Data Warehouse).  It also uses its own query engine and not all features
of SQL Server are supported.  So, you might not be able to use all your DBA tricks.  And you wouldn’t want to build a solution against SQL Server and then just hope to upsize it to Parallel Data Warehouse Edition.

PDW uses multiple servers within the appliance, virtualized as if they were one unified data warehousing resource available.  It can use up to 480 cores.  PDW Works by controlling several different physical servers each running their own instance of SQL
Server 2008 R2. The database and it’s tables are spread across these physical servers but appear as one database and table(s) to the end-user. The
data warehouse appliance or brain of the PDW manages query execution and the meta data for what is stored and processed on what portion of the PDW.

Microsoft IT’s experience with PDW showed when they migrated Information Security Consolidated Event Management (ICE) to PDW, they saw query performance improve to an average of 15-20 times faster in PDW, SSIS data load throughput of up to 285 GB/hour (with
minimal query performance impact), and support for up to 12 TB/day in throughput in SSIS.  See

video.

Part of the technology incorporated into PDW includes a parallel database copy that enables rapid data movement and consistency between PDW and data marts used by SSAS.

In short, PDW is ideal for large data warehouses and BI, but not for OLTP systems.  Write one check, and you get a complete soup-to-nuts data warehouse storage engine that includes everything from the servers, SAN, configuration, and training.

HP calls PDW by a different name: Enterprise Data Warehouse (EDW).  Here is the layout of the HP Enterprise Data Warehouse Appliance (full specs
here with

review and
architecture overview and performance guide).  The architecture is
hub-and-spoke and supports up to 47 servers, made up of a
control rack and a
data rack.  A one rack system has 17 servers, 22 processors/132 cores, and 125TB and can be scaled out to a four rack system with 47 servers, 82 processors/492 cores, and 500TB:

The future road map for PDW includes
column store, petabyte scalability, real-time data warehousing, MDM, and data quality.

 

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.