This article describes how to synchronize data from MySQL to Oracle via ODI.1. Define the physical architecture1.1 Creating a new MySQL data serverTopology->physical Architecture->mysql, right-click to select New Data Server and enter the relevant
Kettle
Main content:
one. ETL Introduction
two. Kettle Introduction
three. Java Invoke Kettle API
first, the introduction of the ETL
1. What is ETL.
1). ETL is "Extract", "Transform", "load" three words of the acronym is also the data extraction, conversion, loading process
? Will ETL tools be used?
What is ETL?
|
From Baidu
FunctionETL extracts data from distributed and heterogeneous data sources, such as relational data and flat data files, to a temporary middle layer for cleaning, conversion, and integration. Finally, it loads the data to a data warehouse or a data set, it is the basis for Online Analytical Processing and data mining.
Reprinted please indicate the source: Thank youHttp://blog.csdn.net/lyy289065406/article/details/6746954
General question:
First of all, the "general question" described below is not the original intention of the question, but it is impossible to perform AC based on the original intention of the question, because the test database is very different from the original intention of the question. By the way, it is recommended that you do not want to do this if you are new to poj, because if you do
In the previous section, we crawled nearly 70 thousand pieces of second-hand house data using crawler tools. This section pre-processes the data, that is, the so-called ETL (extract-transform-load)
I. Necessity of ETL tools
Data cleansing is a prerequisite for data analysis. No matter how high the algorithm is, when an error data is encountered, an exception is thrown out, and it is absolutely dead. Howeve
you know the values of all the properties of the candidate key, you can retrieve any value of any property of any tuple. 3NF, on the basis of satisfying the second paradigm, all non-key attributes must depend on the non-transitive candidate key. Simply put, all non-key attributes must be independent of each other, and a non-key attribute cannot depend on another non-key attribute. Following a brief introduction to the life cycle of the data, common business systems typically contain only one st
candidate key, you can retrieve any value of any property of any tuple. 3NF, on the basis of satisfying the second paradigm, all non-key attributes must depend on the non-transitive candidate key. Simply put, all non-key attributes must be independent of each other, and a non-key attribute cannot depend on another non-key attribute. Following a brief introduction to the life cycle of the data, common business systems typically contain only one stage of online transaction processing, but as the
the attribute of the indicator. For example: "2015-01-12 PV is 1000", then the date (is the 2015-01-12 abstract) is the dimension, PV is the indicator, 1000 is the valueLatitude tableThe dimension table puts the data table that holds the dimension, or the data table of the dimension relationshipReal-time TablesThe fact table holds the data to query the dimension. For example: Daily PV, UVEtlis an acronym for English extract-transform-load that descri
of application programs and technologies for gathering, storing, analyzing and providing access to data to help enterprise users make better business decisions. A dw is a collection of data designed to support management demo-making. according to Bill inmon, a DW is a "subject-oriented, integrated, time-variant, nonvolatile collection of data in support of demo-- making. "DWS tend to have these distinguishing features:
Use a subject-oriented dimen1_data model,
Contain publishable
For a small etl scheduling, colleagues need to return the execution status of the stored procedure and control whether the subsequent dependency is executed, I only returned the output parameters of the stored procedure in the shell script that calls and executes the stored procedure, and did not write a specific control process for everyone. If you continue development in this way, that is a small etl sche
acronymsAcronyms are words that consist of the first letter of each word in a term or phrase. For example, HTML is an acronym for Hypertext Markup Language. Acronyms should be used in identifiers only if they are widely recognized and understood by the public. Acronyms are different from abbreviations because abbreviations are abbreviations for a word. For example, theID is an abbreviation for identifier . Typically, the library name should not use a
correctly set. Run dbms_metadata_util.load_stylesheets with SYSDBA
[oracle@DB-Server admin]$ oerr ora 39213 39213, 00000, "Metadata processing is not available" // *Cause: The Data Pump could not use the Metadata API. Typically, // this is caused by the XSL stylesheets not being set up properly. // *Action: Connect AS SYSDBA and execute dbms_metadata_util.load_stylesheets // to reload the stylesheets.
SQL> exec dbms_metadata_util.load_stylesheets
Case 3:
The error is as follows:
Pump cannot use the Metadata API because XSL stylesheets is not correctly set. Run dbms_metadata_util.load_stylesheets with SYSDBA
[oracle@DB-Server admin]$ oerr ora 39213
39213, 00000, "Metadata processing is not available"
// *Cause: The Data Pump could not use the Metadata API. Typically,
// this is caused by the XSL stylesheets not being set up properly.
// *Action: Connect AS SYSDBA and execute dbms_metadata_util.load_stylesheets
// to reload the stylesheets.
SQL> exec db
use the metadata API because the XSL stylesheets is not set properly. Need to perform dbms_metadata_util.load_stylesheets with SYSDBA
-->
[Oracle@db-server admin]$ oerr ora 39213 39213, 00000
, "Metadata processing is not available"
//*cause:the Data Pump could not use the Metadata API. Typically,
//This are caused by the XSL stylesheets not being set up properly.
*action:connect as SYSDBA and execute dbms_metadata_util.load_stylesheets
//to reload the stylesheets.
Sql>e
The data loading strategy mentioned in this paper is the OLTP system as the source system, and
The general data loading strategy used by ETL data to be loaded into OLAP system.
Depending on the specific nature of this approach, the ETL data load generally has the following four kinds of parties
Case:
1. Time Stamp mode
You need to uniformly add time fields as timestamps in the business tables in the OLTP sy
Four types of BI open source tools introduction-spagobi,openi,jaspersoft,pentaho1 Brief Introduction to BI systemsFrom a technical point of view BI includes ETL, DW, OLAP, DM and other links. Simply put, the transaction system has already occurred data, through the ETL tool extracted to the subject of a clear data warehouse, after the OLAP generation cube or report, through the portal to show users, users u
load (load), the ETL process, Merged into an enterprise-level data warehouse, so as to obtain a global view of enterprise data, based on the use of appropriate query and analysis tools, data mining tools, OLAP tools and other analysis and processing (this time information into the knowledge of auxiliary decision), and finally the knowledge presented to the manager, Provide data support for the decision-making process of the manager. Business intellig
Open-source Bi SYSTEM
Directory
Open-source Bi system Classification
Bi application tools
ETL tools
Table tools
Eclipse Birt
OLAP tools
Open source database
Open-source Bi suite
Bizgre
Openi
Pentaho
Spagobi
Open-source Bi system Classification
Bi applicatio
zookeeper, which runs on top of a computer cluster for managing Hadoop operations.6. HIVE (Data Warehouse)Open source from Facebook, originally used to solve the massive structural log data statistics problem.Hive defines a SQL-like query Language (HQL) that translates SQL into a mapreduce task executed on Hadoop. Typically used for offline analysis.HQL is used to run query statements stored on Hadoop, and Hive lets unfamiliar mapreduce developers write data query statements, which are then tra
Summary
As the saying goes: "Ten years to grind a sword", Microsoft through 5 years of careful build, in 2005 heavily launched SQL Server 2005, this is from SQL Server 2000 after another. This enterprise-class database solution consists of the following aspects: Database Engine Services, data mining, Analysis Services, Integration Services, Reporting services, Among them integration Services (that is, SSIS), is the intermediary between them, the link between the various sources of data, through
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.