International - English

Cart Console

Topic Center

Contact Sales

Home > Hot Categories > Big Data

Enterprise-Class Big Data processing solution 03-Data Flow

Last Update:2016-07-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The process of data processing is divided into mining and data analysis, broadly speaking, the data analysis refers to the whole process, but the data analysis process is much the same,

Data mining is usually filtered, rinsed, and matched by three processes:

1. Filtering: The data is not suitable for analysis of the data filtered out, like the defective of the product pipeline, the data to the group granularity of filtering, its rules can be data size, character length;

2. Rinse: Also known as format, the data are divided, the data are also composed of time, data sources, data bodies and so on, like head, body, feet. Turning data into the format we want, this process is also a label process, meaning the data classification processing.

3. Match: The match is the extraction of the field, the data in the useful areas extracted. (regular processing) because there are too many categories of data to complete the matching of all data, this requires automatic machine recognition. Note that the results of machine learning are not accurate, so data is stored separately.

The process of data mining is the process of formatting unformatted and semi-formatted data, in other words, the data is rule-making.

After the data mining process is over, it is the data analysis phase, the process

Data analysis is SQL aggregation operations, the data format is to be able to use the SQL language to process the data, in other words, how to analyze how to analyze, as long as you will operate the database.

However, the analysis of data is multidimensional: it is divided into one-dimensional, two-dimensional and three-dimensional analysis by dimension.

One-dimensional analysis is mainly based on table queries, multiple fields, individual fields, TopN, grouping, etc. aggregate functions

Two-dimensional analysis is mainly based on time, why so, time-based analysis will be complex, and more with the prediction of the relationship (prediction that must not be people think, the machine think)

Three-dimensional analysis is mainly based on the object, how to say, is to model data, data modeling is like Java class, the construction of virtual entities, based on the analysis of entities.

The above dimensions are based on the previous dimension.

There is no four-dimensional, five-D, wood must have wood, to give an example of operation and maintenance:

Example: server operating condition

Server A 2016-07-09 12:00:00 cpu:90% mem:90%

Application a 2016-07-09 12:00:00 cpu:40% mem:40% (men>60% to run properly)

Application b 2016-07-09 12:00:00 cpu:40% mem:40% (men>30% to run properly)

Server A system 2016-07-09 12:00:00 cpu:10% mem:10%

So application A will not run properly

Complete flowchart of the entire data processing process:

Enterprise-Class Big Data processing solution 03-Data Flow

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Big Data era: a summary of knowledge points based on Microsof... 11-05

Big Data Architecture Development Mining Analytics Hadoop HBa... 04-28

Big Data Architecture Development Mining Analytics Hadoop HBa... 12-02

0 Basic Learning Cloud computing and Big Data DBA cluster Arc... 02-21

"Big Data dry" implementation of big data platform based on H... 10-21

MYSQL Big Data Import 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Enterprise-Class Big Data processing solution 03-Data Flow

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support