Not much of a practical project, but it is really important to think about demand analysis. In the ETL of data, wrapping requirements are critical, involving the collection and collation of all known requirements, realities, and constraints that affect ETL systems.
About ETL system design and development have a few aspects of the needs.
1. Business needs
Here the business needs are very direct, that is, DW/BI system users of information needs, the process behind the need for those data, my ETL should be targeted.
2. Compliance
Compliance is to say that the data provided in the presentation must be correct and complete without any tampering.
The requirements that should be specially noted in the general Data Warehouse are:
- Save a copy of the data source and subsequent data staging;
- To provide proof of the process of dealing with the completeness of any data result;
- A complete record of the algorithm used for allocation, adjustment, and derivation;
- Provides proof of the confidentiality of data copies over time, both online and offline.
3. Data quality
The importance of data quality cannot be overemphasized.
- Good data quality is the key role of data mining, good data, good business;
- Most of the data sources are distributed and need to be effectively integrated with different data.
- The need for compliance makes it impossible to handle data carelessly.
4. Security
Data Warehouse industry is contradictory to the mentality of data security, the data warehouse itself pursues how to publish data to decision makers extensively,
Security, however, requires restrictions on the data that only users who need to know have access to.
5. Data integration
The ultimate goal of data integration is to seamlessly connect and coordinate all of your systems. Data integration is a consistent dimension and a consistent fact in the data Warehouse.
A consistent dimension is a dimension attribute that establishes the common dimensions throughout the business process.
Consistency is the fact that the public business indicators of individual databases are agreed upon.
6. Data Waiting time
The data waiting time describes how quickly the source system data is delivered to the business user through the DW/BI system.
Data wait times have a significant impact on the ETL architecture.
Clever processing algorithms, parallelization, and strong hardware support can accelerate traditional batch-oriented data flow.
If the waiting time is very urgent, the ETL system architecture must be shifted from batch to stream processing.
7. Archive and Lineage
The recommendation is to write data to disk after each major activity (extraction, purge, and consistency, and commit) of the ETL pipeline.
8. User Submission Interface
The final step of ETL is to transfer data to BI applications, which must be responsible for the content and structure of the data.
Otherwise, it can greatly increase the complexity of the BI application, reduce the speed of querying and creating reports, and make the user feel the data too complex.
9. Available skills
Some ETL decisions must be made based on the human resources available to build and manage the system.
For example, if the team does not have C + + programming capability or cannot reach the corresponding level, it should not build a processing module that relies heavily on the C + + language.
If you have a mainstream manufacturer of ETL tools related skills, there is a lot of spectrum in mind.
The other is to manually write code to generate ETL tools, or to use the vendor's development package.
10. Legacy Licenses
What do you mean, a license? License for Software?
Introduction to extraction, conversion and loading (ii) wrapping requirements