Cloud computing and Big data
Http://www.cstor.cn/textdetail_6067.html
Http://wenku.baidu.com/link?url= Kscwhrjrhi2pdbscqvbmtjtcncuqpnik8xfxlknkwnnttrlmypplbav4gp5cmp-h1bqcrcioxkdsp3xnc3xkdogwdfyy1r9gjkd9euyf47q
The difference between big data and cloud computing
http://www.csdn.net/article/2015-09-11/2825674 Inventory Big Data biosphere, those blossoming open source projects
Cloud storage Technology
Open source Distributed File system has glusterfs, Hadoop, Fastdfs and so on very many
Tachyon http://www.csdn.net/article/2015-06-25/2825056
Ceph and Swift
Ceph is written in C + + and Swift is written in Python, and the performance should be ceph-dominated. Unlike Ceph, however, Swift focuses on object storage, and as one of the OpenStack components is validated by a lot of production practices, well combined with OpenStack, many people now use Ceph to provide block storage for OpenStack, but still use Swift to provide object storage.
Swift's developers have written articles comparing Ceph and Swift:ceph and swift:why we is not fighting.
Ceph and HDFs
Ceph vs. HDFs has the advantage of being easy to scale, with no single point. HDFs is specifically designed for cloud computing like Hadoop, which has the inherent advantage of offline batch processing of big data, and Ceph is a universal, real-time storage system. Although Hadoop can use Ceph as a storage backend (according to Ceph's official tutorial, it does not integrate itself, it writes a concise step Running-hadoop-on-ceph), but performs a computational task that is slightly less than HDFs (time is about 30% slower haceph : Scalable Meta-data Management for Hadoop using Ceph).
Http://www.chinaz.com/program/2015/0504/403143.shtml After ten years: some thoughts on the current situation and future of Ceph
Http://www.oschina.net/project/tag/104/storage different types of storage system open source projects
Hadoop ecosystem
http://blog.csdn.net/woshiwanxin102213/article/details/19688393
Hadoop is a software framework that enables distributed processing of large amounts of data. It has the characteristics of reliability, efficiency and scalability.
At the heart of Hadoop is HDFs and mapreduce,hadoop2.0, which also includes yarn.
For the ecosystem of Hadoop:
Spark/storm
http://www.zhihu.com/question/26568496
Http://developer.51cto.com/art/201412/460116.htm
Spark is based on the idea that when the data is large, it is more efficient to pass the calculation process to the data than to pass the data to the computational process. Each node stores (or caches) its data set, and then the task is submitted to the node. So this is the process of passing the data. This is very similar to Hadoop map/reduce, in addition to actively using memory to avoid I/O operations, so that the iterative algorithm (the input that the previous step calculates the output as the next step) performs more. Shark is just a spark-based query engine (supports AD-HOC ad hoc analysis queries)
And Storm's architecture is diametrically opposed to spark. Storm is a distributed flow computing engine. Each node implements a basic calculation process, and data items flow in and out of interconnected network nodes. Instead of spark, this is about passing data to the process.
Two frameworks are used to process parallel computations of large amounts of data.
Storm is better at dynamically processing a large number of generated "small chunks" (such as real-time computation of aggregation functions or analysis on Twitter data streams).
Spark has been working on existing data works (such as Hadoop data) that have been imported into the spark cluster, and Spark scan is based on in-memory management and minimizes global I/O operations of the iterative algorithm.
http://blog.csdn.net/hguisu/article/details/8454368 using storm for real-time Big data analytics
The ecosystem of Big data
Http://www.csdn.net/article/2012-12-21/2813066-database-road-map a picture to let you know the ecosystem of big data
Http://www.aboutyun.com/thread-11944-1-1.html Open Source Big Data (Hadoop ecosystem, streaming system, etc.) processing tools Summary
Open Source Cloud
Http://www.oschina.net/news/54700/most-popular-opensource-cloud-projects the most popular open source Cloud project collection in the first half of 2014
Http://www.chinacloud.cn/show.aspx?id=19743&cid=22 Inventory of Open source cloud platforms under Linux
OpenStack Docker KVM
Real-time data stream processing
Http://www.csdn.net/article/2014-06-12/2820196-Storm Real-time computation, flow data processing system introduction and simple analysis
http://www.csdn.net/article/2014-12-09/2823038 Building large-scale real-time data stream processing system on the cloud
http://tech.it168.com/a2014/0730/1651/000001651470_all.shtml LinkedIn Big Data expert deep interpretation of the significance of the log
Appendix
http://storm.apache.org/
http://spark.apache.org/
http://hadoop.apache.org/
Https://en.wikipedia.org/wiki/NoSQL
http://docs.openstack.org/developer/swift/
Http://wiki.apache.org/hadoop/HDFS
http://ceph.com/
Big Data cloud computing and other data collection