BI筆記之---增量方式處理Cube

最後更新：2013-12-28 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

本文將類比一個資料倉儲系統，其中有使用者資料，產品資料以及訂單資料。根據這些資料結構建立Cube，並且以累加式更新的方式對其進行處理。

之所以強調增量的方式，是考慮到事實表中資料的增長，假設以後增長到幾十億，全量處理就變得很不現實，所以方案中著重示範以增量方式處理Cube的方案。

增量處理Cube的關鍵是要將事實資料分為兩部分處理，一個是增量事實表，一個是曆史事實表，Cube第一次處理曆史事實表中的資料，以後每次周期性的處理都是處理增量表中的資料。

本文中提及的SQLServer和Visual Studio都是2008版本，2005版本同樣也適用。

資料假設:一張使用者表，一張產品表，一張訂單表，訂單裡記錄的是誰買了什麼。Cube統計的需求就是根據訂單統計誰買過什麼。

首先，建立資料倉儲，在資料引擎下建立BIDemo庫。

650) this.width=650;" title="clip_image002" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image002" src="http://www.bkjia.com/uploads/allimg/131228/230AL3G-0.jpg" width="244" height="106" />

接下來建立使用者表，結構如下：

650) this.width=650;" title="clip_image004" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image004" src="http://www.bkjia.com/uploads/allimg/131228/230AKD1-1.jpg" width="244" height="80" />

此外還有產品表：

650) this.width=650;" title="clip_image006" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image006" src="http://www.bkjia.com/uploads/allimg/131228/230AI3M-2.jpg" width="244" height="88" />

以及曆史訂單表和建立增量訂單表，它們的結構是一樣的：

650) this.width=650;" title="clip_image008" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image008" src="http://www.bkjia.com/uploads/allimg/131228/230AI425-3.jpg" width="244" height="105" />

為了測試方便，我們在使用者表中加入一些測試資料：

650) this.width=650;" title="clip_image010" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image010" src="http://www.bkjia.com/uploads/allimg/131228/230AJU0-4.jpg" width="223" height="244" />

然後在產品表中加入一些測試資料

650) this.width=650;" title="clip_image012" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image012" src="http://www.bkjia.com/uploads/allimg/131228/230AJ2A-5.jpg" width="175" height="244" />

至於事實表，手動加入測試資料就不現實了，所以這裡寫了一個程式利用隨機數來灌測試資料：

650) this.width=650;" title="clip_image014" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image014" src="http://www.bkjia.com/uploads/allimg/131228/230AK351-6.jpg" width="244" height="126" />

這個程式的代碼可以在本文中找到。產生後的資料基本如下所示：

650) this.width=650;" title="clip_image016" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image016" src="http://www.bkjia.com/uploads/allimg/131228/230AGW5-7.jpg" width="226" height="244" />

到此，測試的資料結構以及資料就已經準備好了，相當於有了一個小型的資料倉儲。

接下來在Visual Studio中建立BI解決方案，方案下分為一個SSIS項目和一個SSAS項目。

在SSAS項目下建立資料來源和資料來源檢視，這裡需要注意的是，事實表用曆史表，而不是增量表，儘管其還沒有資料。

首先建立資料來源，串連剛才建立的資料庫，並且在資料來源檢視裡定義好關係，如：

650) this.width=650;" title="clip_image018" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image018" src="http://www.bkjia.com/uploads/allimg/131228/230AL932-8.jpg" width="244" height="171" />

然後，根據此資料來源檢視建立Cube，需要注意的是，度量選擇曆史表，維度選擇使用者和產品兩個表。

最後，部署Cube。這裡只部署就可以了，不需要處理，處理任務將在以後的SSIS包中處理。

下面來看SSIS項目。在SSIS包裡建立四個任務模組，類型分別如下：

650) this.width=650;" title="clip_image020" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image020" src="http://www.bkjia.com/uploads/allimg/131228/230AL952-9.jpg" width="212" height="244" />

前兩個Cube處理模組是用來處理Cube的，資料流負責把增量事實表的資料導到曆史事實表中，最後執行一個SQL任務把增量表中的資料刪除。

兩個Cube模組，前一個是專門處理維度，第二個是處理cube。這裡之所以要把Cube維度處理單獨拿出來放在前面，是因為在筆者經驗中，對Cube的處理雖然是全部處理，但是新增維度資料不會被彙總到其中，所以需要單獨拿出來放在前面處理。

題外話：對於這個地方筆者一直也不是很理解，按理說既然是全部處理那麼怎麼連維度都不處理呢，還需要單拿出來）

以下是設定維度處理模組，在介面中選擇維度即可。

650) this.width=650;" title="clip_image022" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image022" src="http://www.bkjia.com/uploads/allimg/131228/230AM111-10.jpg" width="203" height="244" />

然後是cube處理模組，如。

650) this.width=650;" title="clip_image024" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image024" src="http://www.bkjia.com/uploads/allimg/131228/230AG644-11.jpg" width="204" height="244" />

然後指定累加式更新，並且配置累加式更新的資料表，這裡指定增量表。

650) this.width=650;" title="clip_image026" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image026" src="http://www.bkjia.com/uploads/allimg/131228/230AI135-12.jpg" width="564" height="484" />

Cube處理完成之後就可以把增量表的資料放到曆史表中了，以保證第二天加入的資料都是增量資料。

需要注意的是，在實際的運行當中，一定要保證BI的處理過程時業務系統沒有發生資料，否則就會造成資料遺漏而導致不平。所以，BI的處理一般都是在淩晨。

然後是第三步的資料流模組，此部分的主要任務是將增量表的資料轉移到曆史表中。

650) this.width=650;" title="clip_image028" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image028" src="http://www.bkjia.com/uploads/allimg/131228/230AKQ1-13.jpg" width="244" height="235" />

最後的一個SQL任務是一個Delete或者Truncate table任務，把增量表裡的資料清空。

最終的任務流程如：

650) this.width=650;" title="clip_image030" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image030" src="http://www.bkjia.com/uploads/allimg/131228/230AG591-14.jpg" width="205" height="244" />

執行包，全部成功之後應該如所示：

650) this.width=650;" title="clip_image032" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image032" src="http://www.bkjia.com/uploads/allimg/131228/230AM108-15.jpg" width="211" height="244" />

執行成功後，開啟曆史表，可以探索資料已經在裡面了，而且增量表中的資料已經不存在了。

查詢Cube，可以看到新的資料被彙總到其中。

650) this.width=650;" title="clip_image034" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="clip_image034" src="http://www.bkjia.com/uploads/allimg/131228/230AM145-16.jpg" width="244" height="137" />

通過以上透視表可以清晰的看到誰買了什麼樣的產品。

再次運行Rubbish往增量表裡灌幾條資料，然後重新運行此SSIS包，可以發現新增的資料已經被彙總到Cube中了，注意處理的方式是增量的。

本文提及的資料結構模型都很簡單，主要介紹的是Cube處理的流程以及方法，重點闡述增量部分的方案，以及需要注意的問題。希望有知道更好方法的兄弟一起交流探討。

本文提及的相關資料庫，專案檔以及程式下載

FAQ:

1.增量資料是怎麼來的？

筆者個人認為這個需要跟業務系統配合來做，比如加入觸發器等。或者通過時間戳記，到業務系統中能提取到。

2.如果有更新和刪除怎麼辦？

通常是在本文提到的方案之上，再加一個度量值位，標識為1，代表新增。對於刪除的記錄，實際上是加入了一個同樣的記錄，並且表示為為-1。更新則是加入了兩條記錄，一條記錄為-1跟刪除差不多，一條就為1代表修改後的記錄，三個一樣的記錄靠時間戳記來標記哪一個是修改後的記錄。主要就是以這個標識位作為度量進行統計。

本文出自 “aspnetx的部落格” 部落格，請務必保留此出處http://aspnetx.blog.51cto.com/5490032/1157574

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More