Big Data Summary (zz)

Source: Internet
Author: User
Tags columnar database

I recently read a new big data article in programmer magazine, which is very well summarized. In order to help me find all the content in it, streamline it and write it down.

This article mainly involves the following aspects:
Data transmission, data storage, data computing, data presentation, data development platform, data application market

My previous Summary of data includes data storage, data management, and data computing.

 

Data TransmissionIncluding real-time synchronization and batch synchronization. Generally, time series are used.

Data storage includes kernel-level distributed storage, user-level Distributed File storage, and business-level data storage.

The first two categories are actually well understood. The last one is classified, including columnar database storage, document database storage, K/V type database storage, graphic database, and relational database. The above databases are built on the file system.

 

Data computingIncluding offline computing, stream computing, instant computing, and graph computing.

Offline computing is commonly used (applicable to many scenarios). The specific embodiment is to run the M/R Model on hadoop (some offline computing, or memory-based ).

Streamcompute is based on the growth of data time, and the value of data decreases rapidly. It depends on the correctness and timeliness of Upstream data transmission, as well as the high throughput of downstream storage systems. Currently, streamcompute has not been able to experience such a service. (Storm is used)

 

Real-time computing is based on the query needs from the massive data in real time ranking, sorting, summary and other operations. This system is divided into two types: Online Data Application and online data analysis. The former is the powerdrill project currently used in Google, the Garuda project used in Alibaba Group, and the latter uses dremel, use clouderaImpala(Hbase and HDFS can be analyzed and metadata can be shared with hive. This tool also needs to be investigated later). Currently, the drill open source project in Apache is under development.

 

Graph computing deeply mines user relationships, webpage relationships, and other data, involving large-scale matrix computing, graph computing, and network computing. The MPI or BSP model is introduced to solve such problems. Currently, Apache's hamn project is applied to these scenarios.

 

Data presentationSimply put, data is displayed through tables, pie charts, and bar charts, which are also called data visualization.

 

I am still unfamiliar with the data development platform and data application market. I will try again later.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.