How to Analyze and Process Big Data

Last Update:2020-05-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Visual analysis
The users of big data analysis include big data analysis experts and ordinary users, but the most basic requirement of both of them for big data analysis is visual analysis, because visual analysis can intuitively present the characteristics of big data and can be easily Accepted by the reader.

2. Data mining algorithms

The theoretical core of big data analysis is data mining algorithms. Various data mining algorithms are based on different data types and formats in order to more scientifically present the characteristics of the data itself.

3. Predictive analysis

One of the final application areas of big data analysis is predictive analysis, mining features from big data, building models through science, and then you can bring in new data through the model to predict future data .

4. Semantic engine
The diversification of unstructured data brings new challenges to data analysis. We need a set of tools to analyze and refine data. The semantic engine needs to be designed with enough artificial intelligence to be able to actively extract information from the data.

5. Data quality and data management

Big data analysis is inseparable from data quality and data management. High-quality data and effective data management, whether in academic research or in commercial applications, can ensure that the analysis results are true and valuable.

Big data processing I: Collection

Big data collection refers to the use of multiple databases to receive data sent from the client (Web, App, or sensor format, etc.), and users can use these databases for simple query and processing work, In the process of collecting big data, the main feature and challenge is the high number of concurrency, because at the same time, there may be thousands of users to access and operate

Big data processing II: Import / Preprocessing

Although the collection end itself will have many databases, but if you want to analyze these massive data effectively, you should still import the data from the front end into a centralized large distributed database, or distributed storage cluster, and can be based on the import Do some simple cleaning and pretreatment work. The characteristics and challenges of the import and preprocessing process are mainly the large amount of imported data, and the amount of imports per second often reaches 100 megabytes or even gigabit levels.

Big data processing III: Statistics / Analysis

Statistics and analysis mainly use distributed databases or distributed computing clusters to perform ordinary analysis and classification and summary of the massive data stored in it to meet most common analysis needs. In this regard, some real-time requirements will use EMC's GreenPlum, Oracle's Exadata, and MySQL-based columnar storage Infobright, etc., and some batch processing, or based on semi-structured data requirements can use Hadoop. The main characteristics and challenges of this part of statistics and analysis are the large amount of data involved in the analysis, which greatly consumes system resources, especially I / O.

Big data processing IV: Mining

It is mainly based on the calculation of various algorithms on the existing data, so as to play the role of prediction (Predict), so as to meet the needs of some high-level data analysis. The main tools used are Hadoop Mahout and so on. The characteristics and challenges of this process are mainly that the algorithms used for mining are very complicated, and the amount of data and calculation involved in the calculation are very large. The commonly used data mining algorithms are mainly single-threaded.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to Analyze and Process Big Data

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support