Introduction to extraction, conversion and loading (ii) wrapping requirements

Source: Internet
Author: User

Not much of a practical project, but it is really important to think about demand analysis. In the ETL of data, wrapping requirements are critical, involving the collection and collation of all known requirements, realities, and constraints that affect ETL systems.

About ETL system design and development have a few aspects of the needs.

1. Business needs

Here the business needs are very direct, that is, DW/BI system users of information needs, the process behind the need for those data, my ETL should be targeted.

2. Compliance

Compliance is to say that the data provided in the presentation must be correct and complete without any tampering.

The requirements that should be specially noted in the general Data Warehouse are:

    • Save a copy of the data source and subsequent data staging;
    • To provide proof of the process of dealing with the completeness of any data result;
    • A complete record of the algorithm used for allocation, adjustment, and derivation;
    • Provides proof of the confidentiality of data copies over time, both online and offline.

3. Data quality

The importance of data quality cannot be overemphasized.

    • Good data quality is the key role of data mining, good data, good business;
    • Most of the data sources are distributed and need to be effectively integrated with different data.
    • The need for compliance makes it impossible to handle data carelessly.

4. Security

Data Warehouse industry is contradictory to the mentality of data security, the data warehouse itself pursues how to publish data to decision makers extensively,

Security, however, requires restrictions on the data that only users who need to know have access to.

5. Data integration

The ultimate goal of data integration is to seamlessly connect and coordinate all of your systems. Data integration is a consistent dimension and a consistent fact in the data Warehouse.

A consistent dimension is a dimension attribute that establishes the common dimensions throughout the business process.

Consistency is the fact that the public business indicators of individual databases are agreed upon.

6. Data Waiting time

The data waiting time describes how quickly the source system data is delivered to the business user through the DW/BI system.

Data wait times have a significant impact on the ETL architecture.

Clever processing algorithms, parallelization, and strong hardware support can accelerate traditional batch-oriented data flow.

If the waiting time is very urgent, the ETL system architecture must be shifted from batch to stream processing.

7. Archive and Lineage

The recommendation is to write data to disk after each major activity (extraction, purge, and consistency, and commit) of the ETL pipeline.

8. User Submission Interface

The final step of ETL is to transfer data to BI applications, which must be responsible for the content and structure of the data.

Otherwise, it can greatly increase the complexity of the BI application, reduce the speed of querying and creating reports, and make the user feel the data too complex.

9. Available skills

Some ETL decisions must be made based on the human resources available to build and manage the system.

For example, if the team does not have C + + programming capability or cannot reach the corresponding level, it should not build a processing module that relies heavily on the C + + language.

If you have a mainstream manufacturer of ETL tools related skills, there is a lot of spectrum in mind.

The other is to manually write code to generate ETL tools, or to use the vendor's development package.

10. Legacy Licenses

What do you mean, a license? License for Software?

Introduction to extraction, conversion and loading (ii) wrapping requirements

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.