Big Data cloud computing and other data collection

Last Update:2015-09-11 Source: Internet

Author: User

Tags hadoop ecosystem

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cloud computing and Big data

Http://www.cstor.cn/textdetail_6067.html

Http://wenku.baidu.com/link?url= Kscwhrjrhi2pdbscqvbmtjtcncuqpnik8xfxlknkwnnttrlmypplbav4gp5cmp-h1bqcrcioxkdsp3xnc3xkdogwdfyy1r9gjkd9euyf47q

The difference between big data and cloud computing

http://www.csdn.net/article/2015-09-11/2825674 Inventory Big Data biosphere, those blossoming open source projects

Cloud storage Technology

Open source Distributed File system has glusterfs, Hadoop, Fastdfs and so on very many

Tachyon http://www.csdn.net/article/2015-06-25/2825056

Ceph and Swift
Ceph is written in C + + and Swift is written in Python, and the performance should be ceph-dominated. Unlike Ceph, however, Swift focuses on object storage, and as one of the OpenStack components is validated by a lot of production practices, well combined with OpenStack, many people now use Ceph to provide block storage for OpenStack, but still use Swift to provide object storage.
Swift's developers have written articles comparing Ceph and Swift:ceph and swift:why we is not fighting.
Ceph and HDFs
Ceph vs. HDFs has the advantage of being easy to scale, with no single point. HDFs is specifically designed for cloud computing like Hadoop, which has the inherent advantage of offline batch processing of big data, and Ceph is a universal, real-time storage system. Although Hadoop can use Ceph as a storage backend (according to Ceph's official tutorial, it does not integrate itself, it writes a concise step Running-hadoop-on-ceph), but performs a computational task that is slightly less than HDFs (time is about 30% slower haceph : Scalable Meta-data Management for Hadoop using Ceph).

Http://www.chinaz.com/program/2015/0504/403143.shtml After ten years: some thoughts on the current situation and future of Ceph

Http://www.oschina.net/project/tag/104/storage different types of storage system open source projects

Hadoop ecosystem

http://blog.csdn.net/woshiwanxin102213/article/details/19688393

Hadoop is a software framework that enables distributed processing of large amounts of data. It has the characteristics of reliability, efficiency and scalability.

At the heart of Hadoop is HDFs and mapreduce,hadoop2.0, which also includes yarn.

For the ecosystem of Hadoop:

Spark/storm

http://www.zhihu.com/question/26568496

Http://developer.51cto.com/art/201412/460116.htm

Spark is based on the idea that when the data is large, it is more efficient to pass the calculation process to the data than to pass the data to the computational process. Each node stores (or caches) its data set, and then the task is submitted to the node. So this is the process of passing the data. This is very similar to Hadoop map/reduce, in addition to actively using memory to avoid I/O operations, so that the iterative algorithm (the input that the previous step calculates the output as the next step) performs more. Shark is just a spark-based query engine (supports AD-HOC ad hoc analysis queries)

And Storm's architecture is diametrically opposed to spark. Storm is a distributed flow computing engine. Each node implements a basic calculation process, and data items flow in and out of interconnected network nodes. Instead of spark, this is about passing data to the process.

Two frameworks are used to process parallel computations of large amounts of data.

Storm is better at dynamically processing a large number of generated "small chunks" (such as real-time computation of aggregation functions or analysis on Twitter data streams).

Spark has been working on existing data works (such as Hadoop data) that have been imported into the spark cluster, and Spark scan is based on in-memory management and minimizes global I/O operations of the iterative algorithm.

http://blog.csdn.net/hguisu/article/details/8454368 using storm for real-time Big data analytics

The ecosystem of Big data

Http://www.csdn.net/article/2012-12-21/2813066-database-road-map a picture to let you know the ecosystem of big data

Http://www.aboutyun.com/thread-11944-1-1.html Open Source Big Data (Hadoop ecosystem, streaming system, etc.) processing tools Summary

Open Source Cloud

Http://www.oschina.net/news/54700/most-popular-opensource-cloud-projects the most popular open source Cloud project collection in the first half of 2014

Http://www.chinacloud.cn/show.aspx?id=19743&cid=22 Inventory of Open source cloud platforms under Linux

OpenStack Docker KVM

Real-time data stream processing

Http://www.csdn.net/article/2014-06-12/2820196-Storm Real-time computation, flow data processing system introduction and simple analysis

http://www.csdn.net/article/2014-12-09/2823038 Building large-scale real-time data stream processing system on the cloud

http://tech.it168.com/a2014/0730/1651/000001651470_all.shtml LinkedIn Big Data expert deep interpretation of the significance of the log

Appendix

http://storm.apache.org/

http://spark.apache.org/

http://hadoop.apache.org/

Https://en.wikipedia.org/wiki/NoSQL

http://docs.openstack.org/developer/swift/

Http://wiki.apache.org/hadoop/HDFS

http://ceph.com/

Big Data cloud computing and other data collection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More