DB, ETL, DW, OLAP, DM, BI relationship structure diagramHere are a few words about some of their concepts:(1)db/database/Database -This is the OLTP database, the online things database, used to support production, such as the supermarket trading system. DB retains the latest state of data information, only one state! For example, every morning to get up and face in the mirror, see is the state, as for the previous day of the state, will not appear in
ETL is the abbreviation of "extract", "transform", and "LOAD", that is, "extraction", "Conversion", and "loading ", however, we often call it Data Extraction for short.
ETL is the core and soul of Bi/DW (Business Intelligence/data warehouse). It integrates and improves the value of data according to unified rules, it is responsible for the process of converting data from the data source to the target data w
There are four data ETL modes based on the model design and source data:
Completely refresh, image increment, event increment, Image Comparison
There are four data ETL modes based on the model design and source data:
Completely refresh: Only the latest data is included in the data warehouse data table,The original data is deleted for each load, and the latest source data is fully loaded.. In this mode,
ETL 4: SQL server integration services
SSIS is Microsoft's upgrade to DTS on SQL server2005. It has to be said that Microsoft has spent a lot of effort on Bi, including providing multiple tools such as SSIs, SSAs, and SSRs, from creating a data warehouse to extracting data from metadata, to creating dimensions and mining structures, mining models, training, report model design, report design, and publishing, It is very convenient and powerful. Start
pushes the data from the data source. If the data source is protected and is forbidden, you can only use the data source to push the data.The following table summarizes the source data tables and their extraction modes used by the dimension and fact tables in this example.
Time stamp Mode
Snapshot mode
Trigger mode
Log mode
Ability to differentiate inserts/updates
Whether
Is
Is
Is
Multiple updates detected during
Sqoop, which requires the Sqoop metadata shared storage to be turned on as follows:Sqoop metastore >/tmp/sqoop_metastore.log 2>1 For questions about Oozie not running Sqoop job, refer to the following link: http://www.lamborryan.com/oozie-sqoop-fail/(4) Connecting Metastore rebuilding Sqoop JobThe Sqoop job created earlier, whose metadata is not stored in the share Metastore, needs to be rebuilt using the following command.Sqoop Job--show Myjob_incremental_import | grep incremental.last.valuesq
One of the goals of the data warehouse is the ability to provide timely, consistent, and reliable data for enhanced business functions.In order to achieve the above objectives, ETL must be continuously improved according to the following three standards:
Reliability
Availability of
Ease of management
Subsystem 22--Job Schedulersubsystem 23--Backup Systemsubsystem 24--Recovery and restart systemsubsystem 25--version control systemSubsyste
1 Log Table 1.1 ideasA log table is used to record the primary key of a table Yw_tablea the changed data in the Business library. Before the data enters the BI Library target table Bi_tablea, delete is based on the primary key recorded by the log table.1.2 Design 1.2.1 Log table structureCREATE TABLE LOG ( varchar), -- primary key 1 VARCHAR(20 ), - - primary key 2 VARCHAR, - - source table updatedate Date, -- update date loaddate- - Load Date );1.2.2
Tags: commercial int ase NSF process form color number BottomHere are a few words about some of their concepts:(1)db/database/Database -This is the OLTP database, the online things database, used to support production, such as the supermarket trading system. DB retains the latest state of data information, only one state! For example, every morning to get up and face in the mirror, see is the state, as for the previous day of the state, will not appear in front of your eyes, this is a db.(2)dw/d
Tags: sel note Select avoid IMG int Data Warehouse Problem toolbarFirst, prefaceThe company practical Hadoop constructs the Data warehouse, during the inevitable practical hivesql, in the ETL process, the speed has become the question which avoids can avoid. I have a few data tables associated with running 1 hours of experience, you may feel indifferent, but many times ETL will be multiple hours, very waste
Tags: Options import profile preparation Query str user Lin marginIntroduction to ETL: ETL (extract-transform-load abbreviation, that is, the process of data extraction, transformation, loading) Database to Database The following explains: Kettle Tool Implementation method Case Purpose : Import the EMP table from user Scott under User testuser. Preparation: first create a new table with the same structure a
See you share a lot of Hadoop related content, I introduce you to an ETL tool--kettle.Kettle is an ETL tool of Pentaho company Open source, like Hadoop, is also Java implementation, the purpose is to do data integration when the data extraction (Extract), conversion (Transformat), load (loading) work. There are two script files in Kettle, transformation and job,transformation complete the fundamental transf
Component As ScriptComponent)
ParentComponent = Component
End Sub
End Class
Public Class Variables
Dim ParentComponent As ScriptComponent
Public Sub New (ByVal Component As ScriptComponent)
ParentComponent = Component
End Sub
End Class
10) Open the "target" Data Stream
Create a ing
650) this. width = 650; "height =" 645 "border =" 0 "src =" http://www.bkjia.com/uploads/allimg/131229/1U9532619-8.gif "alt =" clip_image009 "title =" clip_image009 "style =" border-bottom: 0px; border-left: 0px; bor
Different map service platforms have diverse requirements on map file formats, and files used by ArcGIS are difficult to be used on other platforms, therefore, a format conversion service is required to overcome the trouble of using different platforms. The following uses the conversion from TIFF format to geotiff format as an example.First, you need to prepare several items:1. Make sure that ArcGIS data interoperability for desktop is installed.2. Check data interoperability in the extended mod
This section describes how ETL (data extraction, loading, and conversion) of my game transaction data analysis project is implemented.Let's talk about the source system first. Because the server of our transaction master station is not hosted in the company, we cannot directly extract data from the source system. As a matter of fact, we already have a simple data analysis system. We don't have to worry about this. We did not use the sqlserver2005 Bi p
Label: Use strong data on time database to Apply Oracle technology Incremental extraction incremental extraction only extracts new or modified data from the table to be extracted from the database since the last extraction. During ETL usage. Incremental extraction is more widely used than full extraction. How to capture changed data is the key to incremental extraction. There are generally two requirements for the capture method: accuracy, which can
Label:Reprinted from: http://www.cnblogs.com/ycdx2001/p/4538750.html -------------- In the leader said the urine is not wet and the beer story, here see the original text. (1) db/database/Database --This refers to the OLTP database, the online things database, to support production, such as the supermarket trading system. DB retains the latest state of data information, only one state! For example, every morning to get up and face in the mirror, see is the state, as for the previous day of the
These years, almost all work with ETL, have been exposed to a variety of ETL tools. These tools are now organized to share with you.
An ETL Tool
Foreign
1. DataStage
Reviews: The most professional ETL tools, expensive, the use of the general difficulty
Download Address: Ftp://ftp.seu.edu.cn/Pub/Develop ... tastag
operating system.There are many versions of Linux, and I chose to develop my personal BI system based on this stable version:Red Hat Enterprise Linux Server release 6.4 (Santiago)
3. Bi System host InformationTo do this, after selecting the operating system, come down to install the server. I chose a VMware virtual machine to install the Linux server. Here, the installation of VMware virtual machines has a lot of relevant articles on the network, I will not repeat. Interested partners can
caused by abuse.Acronyms, idioms, data input errors, repeated records, lost values, and spelling changes. Even if there is a large amount of noise data in a well-designed and well-planned database system, this system will alsoIt makes no sense, because "garbage in, garbage out" (garbage in, garbageThe system cannot provide any support for the decision analysis system. To clear noise data, data must be cleaned in the database system. At present, there are a lot of research on data cleansing and
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.