Introduction to ETL technology: Introduction to ETL, data warehouse, and etl Data WarehouseETL is the abbreviation of Extract-Transform-Load. It is used to describe the process of extracting, transforming, and loading data from the source to the target. ETL is commonly used in data warehouses, but its objects are not l
ETL
TL, short for extraction-transformation-loading. The Chinese name is data extraction, conversion, and loading. ETL tools include: owb (Oracle warehouse builder), Odi (Oracle data integrator), informatic powercenter, aicloudetl, datastage, repository explorer, beeload, kettle, dataspider
ETL extracts data from distributed and heterogeneous data sources, suc
As a data warehouse system, ETL is a key link. If it is big, ETL is a data integration solution. If it is small, it is a tool for data dumping. Recall that there have been a lot of data migration and transformation operations over the past few years. However, the work is basically a one-time job or a small amount of data. You can use access, DTS, or compile a small program on your own. However, in the data
ETL scheduling development (1) -- writing instructions, etl Scheduling
Preface:
During database operation and maintenance, files are often transferred between systems to perform operations such as data extraction, conversion, and integration. In addition, statistical scheduling is performed after data integration. Here, I will describe an ETL scheduling developed
ETL scheduling development (5) -- connect to the database to execute database command subroutines and etl Scheduling
In ETL scheduling, you need to connect to the database to read and write data. The following subprograms use the input database connection string and database commands (or SQL) to perform the required operations:
#!/usr/bin/bash#created by lubinsu
SAS Program has three types of errors:
Programing logic errors (how to identify and resolve );
Syntax errors (how to recogize ANC correct );
Data errors (how to examine and resolve.
How to write an efficient SAS Program:
Write code that is easy to read as much as possible (one line per sentence, indented layout, and comments more)
Test any part of the program
Test Program with a small dataset (for
Disk media in the storage area is the most critical device, and all data and information is stored on disk media. The reading speed of the data is determined by the connection interface of the disk media. We used to do data storage through SCSI or SATA interfaces and hard drives. But in recent years a new technology has increasingly been favored by small and medium-sized enterprises and even large enterprises. Is the SAS technology and its correspondi
Opening remarks:
I personally think that the entire SAS protocol family is large, and there are many concepts that are hard to understand. I can only study the actual code at work while reading the protocol in detail. In this way, we can better understand the SAS protocol.
It is the best way to understand the protocol according to the code. Think about all the protocols for data transmission. The communicat
ETL (extract-transform-load abbreviation, that is, data extraction, transformation, loading process), for enterprise or industry applications, we often encounter a variety of data processing, conversion, migration, so understand and master the use of an ETL tool, essential, Here I introduce a I used in the work of 3 years of ETL tools kettle, the spirit of good t
The main indexes of this series of articles are as follows:
I. ETL Tool kettle Application Analysis Series I [Kettle Introduction]
Ii. ETL Tool kettle Practical Application Analysis Series 2 [application scenarios and demo downloads]
Iii. ETL Tool kettle Practical Application Analysis Series III [ETL background process
Path to mathematics-sas memo (14), path to sas memo 14
Sas Date Format
Data _ null _;X = '7jan2012 'd;
Put x yymm7 .;
Put x yymmc7 .;
Put x yymmd7 .;
Put x yymmn6 .;
Put x yymmp 7 .;
Put x yymms7 .;
Put x yymon7 .;
Put x mmddyy10 .;
Put x yymmdd10 .;
Run;
All content of this blog is original, if reproduced please indicate the source http://blog.csdn.net/myhaspl
Path to mathematics-sas memo (17), path to mathematics-sas memo 17SAS Date and Time FormatData _ null _;Input mydate YYMMDD10 .;Put mydate YYMMDDB10 .;Put mydate YYMMDDC10 .;Put mydate YYMMDDD10 .;Put mydate YYMMDDN8 .;Put mydate YYMMDDP10 .;Put mydate YYMMDDS10 .;Cards;2014-05-;Run;All content of this blog is original, if reproduced please indicate the source http://blog.csdn.net/myhaspl/2014 05 182014-05-
recently re-spread on the database, the company's core database every day IO full, in the study with SAS 16*RAID10, or RAID10 ssd*6, or FIO;the principle, no raid-5 for database; eh; the root of all evil ;No.2 principle, most of the time, the database does not need too high CPU, now 2 sockets are basically enough, more MySQL is not used;No.3 principle, large memory, high IO, is the necessary condition of modern web-based database;now the big companies
In practical applications, we often convert wide data (one patient observation) into long data (one patient observation) or long data (one patient multiple observations) into wide data (one observation for a patient), and in R we can use the Reshape2 package. There are two implementations of the SAS: arrays and transpose. This blog post first explains the use of arrays to reconstruct SAS data, and the next
Path to mathematics-sas memo (13), path to sas memo 13Libname sastemp 'e:/sastemp /';Option user = sastemp; * specifies the reference name of the next logical database;Page; * The log starts from a new page;Data sales;Input id $ price;Skip 6; * Five blank lines are generated in the log;Cards;1 23.32 99.233 91.01;Proc print;Run;X 'dir. '; * execute the operating system command; all content in this blog is or
Path to mathematics-sas memo (9), path to mathematics-sas memo
View
Libname saslib "k: \ sas ";
DataSaslib. testview4/view = saslib. testview4;
Set saslib. test4;
Run;
Proc printData = saslib. test4 noobs label;
Run;
Proc SQL;
Select testview4.id as student ID, testview4.score as score fromsaslib. testview4;
Quit;All content of this blog is original, if reproduc
Etl tool, kettle implementation loop, etl Tool kettle implementation
Kettle is an open-source ETL Tool written in java. It can be run on Windows, Linux, and Unix. It does not need to be installed green, and data extraction is efficient and stable.
Business Model: there is a large data storage table in the relational database, which is designed as a parity datab
Path to mathematics-sas memo (6), path to mathematics-sas memo
Connect external data
Connect to database type options
All content of this blog is original, if reproduced please indicate the source http://blog.csdn.net/myhaspl/
Connection ACCESS:
Proc SQL;
Connect to access as db (path = "e: \ xx. mdb ");
Connect to excel:
Proc SQL;
Connect to EXCEL (PATH = "k: \ student_excel.xlsx" GETNAMES = YES
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.