Enterprise-Class Big Data processing solution 03-Data Flow

Source: Internet
Author: User

The process of data processing is divided into mining and data analysis, broadly speaking, the data analysis refers to the whole process, but the data analysis process is much the same,

Data mining is usually filtered, rinsed, and matched by three processes:

1. Filtering: The data is not suitable for analysis of the data filtered out, like the defective of the product pipeline, the data to the group granularity of filtering, its rules can be data size, character length;

2. Rinse: Also known as format, the data are divided, the data are also composed of time, data sources, data bodies and so on, like head, body, feet. Turning data into the format we want, this process is also a label process, meaning the data classification processing.

3. Match: The match is the extraction of the field, the data in the useful areas extracted. (regular processing) because there are too many categories of data to complete the matching of all data, this requires automatic machine recognition. Note that the results of machine learning are not accurate, so data is stored separately.

The process of data mining is the process of formatting unformatted and semi-formatted data, in other words, the data is rule-making.

After the data mining process is over, it is the data analysis phase, the process

Data analysis is SQL aggregation operations, the data format is to be able to use the SQL language to process the data, in other words, how to analyze how to analyze, as long as you will operate the database.

However, the analysis of data is multidimensional: it is divided into one-dimensional, two-dimensional and three-dimensional analysis by dimension.

One-dimensional analysis is mainly based on table queries, multiple fields, individual fields, TopN, grouping, etc. aggregate functions

Two-dimensional analysis is mainly based on time, why so, time-based analysis will be complex, and more with the prediction of the relationship (prediction that must not be people think, the machine think)

Three-dimensional analysis is mainly based on the object, how to say, is to model data, data modeling is like Java class, the construction of virtual entities, based on the analysis of entities.

The above dimensions are based on the previous dimension.

There is no four-dimensional, five-D, wood must have wood, to give an example of operation and maintenance:

Example: server operating condition

Server A 2016-07-09 12:00:00 cpu:90% mem:90%

Application a 2016-07-09 12:00:00 cpu:40% mem:40% (men>60% to run properly)

Application b 2016-07-09 12:00:00 cpu:40% mem:40% (men>30% to run properly)

Server A system 2016-07-09 12:00:00 cpu:10% mem:10%

So application A will not run properly

Complete flowchart of the entire data processing process:

Enterprise-Class Big Data processing solution 03-Data Flow

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.