Year-End summary

Source: Internet
Author: User

Working for 3 years, basically every day with the data, a little summary, stay as a souvenir, but also hope to discuss with you.

3 years to do a lot of model design, but always do not have their own too satisfied with the work, but every step is growing, the later work is better than the previous, thanks to the former took me to the beginning of a Hugh brothers, but also thanks to the other brothers in the project at the scene later. Because the work is mainly to do the analysis of the library includes: Das System, Business Analysis system, performance system or data warehouse. Dare not comment on the production system. The following content is just a discussion, there are differences can be discussed together, progress together. The individual thinks that the analysis type system has the following common characteristics:

1. The need to access a large number of external systems, the above system itself is not part of the production system, they are only the production system data processing, therefore, external production systems are essential.

2. Due to the changing business, the construction of such systems will not be accomplished overnight. Therefore, a slow iteration, predictable redundancy is particularly important.

3. Compared with the production system, the analysis system is the data set processing, and the production system is the data business sense integrity, the analysis system maintains is the data set integrity.

In the construction of such systems, in general, business people (or party) will have and only a few vague conceptual needs, such as they want to see a daily growth of a business (product dimensions), the performance of an employee (person, organization Dimension), the annual performance status (Time dimension). At the time of system construction, business experts are required to summarize, guide and classify their business needs. The implication is to clarify customer ambiguity and provide a predictable blueprint for the design. The unreasonable demand (including unreasonable business, unreasonable project cost, etc.) is guided to the reasonable demand, and the classification is mainly to provide a blueprint for the model design. When a business expert is working, the data expert (the Model Designer) should also enter the work,

Engaged in pre-demand research, including understanding the other party's it construction situation, how many external systems, system data accuracy, whether there are some old platform functions need to be ported to the new system, the industry's common KPI can be supported. And the other decision-making personnel to the system's positioning, whether beyond the real load-carrying capacity of the system. And to make the other party clear, the construction of such a system is a long process, the maintenance of the latter is crucial, is by the business Department of technical personnel maintenance or by party B or party a technical department maintenance is to consider the important factors. At the same time, we need to get the data dictionary of each other's production system as much as possible, to understand what data the other system can output, and to facilitate detailed design.

After a phase of the above work, the data experts should start the interface design, the design principles include the following points:

1. Whether the data is push or pull.

2. The amount of data per interface can be determined as incremental, full, or incremental + full-mode transmission.

3. Data frequency setting, week, month, quarter or year.

4. Interface naming specification, preferably with interface table name + data date, generally named need to reflect the original system name, data level, and cycle, such as ods_core_****_daily, can reflect the core system of data, daily transmission, ODS as interface layer marked.

This layer of data generally do not do business meaning conversion, all through the text loaded into the target library (our project library), loading way through the script or through professional ETL tools also to be considered, but the model design has little impact, do not do too much description.

When the data is accessed, it is necessary to clarify the specific meanings of each data with party A, the specific caliber of the indicator, the calculation formula, and the initialization of the dimension table, including 2 kinds of cases:

1. Only add, not the table of the dimension table, such as the product table, this kind of table only need to normalize the code, and merger data can be.

2. In addition to the incremental, slowly changing dimension tables, such as personnel information tables, due to changes in the role of personnel, resulting in changes in their assessment indicators, it is necessary to add valid date, Expiration Date field, and timely maintenance when the role changes.

In addition to the above two cases, the normalization of coding, the different encoding of multiple systems into standard coding is also one of the stages of work, generally speaking, the stage mapping should be left to the Customer Front Maintenance window, when the organization department needs to add a new organization, need to let the technology department in each system to generate unified code, and maintenance is done in the system to be used by the business unit.

After the design of the dimension table, according to the indicators of the business unit, will complete the fact table design, classify all the fact tables according to the business topic, form a star model, and in the Interface Layer fact table interface cleaning and warehousing, including the date format unification, the field name is unified, and according to the business decision on some fine-grained useless data cleaning, Ensure data accuracy in the case of storage, the layer is persistent layer, the need to meet the most fine-grained query, and to consider the design of the uninstallation mechanism, when the detail is retained for 3 months or more to unload the archive processing. Generally recommended partition table design mode, according to the time partition uninstall.

Design to this, it can be seen that the previous layer of design based on business logic. In this layer, we need to consider the business requirements, simply say, how to meet the various reports: fine-grained data needs to be summarized, each level needs to summarize the next layer of data. and combine the system business function, carry on the business logic processing. This layer table needs to consider how redundancy is designed, and if multiple tables are associated to complete a query that can affect the user experience, save information redundancy. If the amount of data is still too large, you can consider saving the table.

---to be continued

Year-End summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.