spark-the new overlord of cloud computing big data field

Source: Internet
Author: User
Keywords Big data cloud computing

According to relevant data, China's mobile internet users in the first half of 2013 has exceeded the 500 million mark, is expected in the first quarter of 14, the domestic mobile internet users will be over the PC, mobile phone users more than 1 billion, 3G users continue to grow, as well as 4G strong momentum, have spawned mobile large data explosion. A lot of new data is emerging all the times, and the mobile Internet is affecting all aspects of human life.

This will be an unprecedented era. All companies and institutions are or are becoming mobile internet organizations. All companies and institutions will eventually be big data organizations for cloud computing. The wave of large data on mobile internet and cloud computing is and will ultimately revolutionize the architecture, production, service, and management patterns of all companies and institutions.

spark-New Generation Almighty large data computing platform rising

With the gradual maturation of large data-related technologies and industries, multiple types of large data analysis operations are often required within a single organization: The traditional Hadoop mapreduce is best at the statistical analysis of off-line mass data, because of the characteristics of Hadoop itself, The result of using Hadoop to handle large data is often delayed in minutes or even hours, which is unacceptable in many scenarios. More importantly, before the advent of spark, in order to complete various kinds of large data analysis tasks, such as iterative computation, streaming calculation, common graph calculation, SQL relation query, interactive ad hoc query, etc. in an organization, we have to deal with several independent systems. On the one hand, it introduces the complexity of operation and dimension, on the other hand, it is unavoidable to frequent costly data dumps among multiple systems.

Spark is based on memory, is the cloud computing domain after Hadoop next generation of the hottest general-purpose parallel computing framework open source project, especially outstanding support interactive Query, flow calculation, graph calculation and so on.

Spark has unparalleled advantages in machine learning and is particularly suited to algorithms that require multiple iterations. At the same time, Spark has a very good fault-tolerant and scheduling mechanism to ensure the stable operation of the system, spark current development concept is through a computing framework set SQL, Machine Learning, Graph Computing, streaming Computing and so on a variety of functions in a project, with very good ease of use.

Spark unparalleled advantage to occupy the dominant position of cloud computing big Data domain

Spark is a cluster computing platform originating from the Amplab of the University of California, Berkeley, which is based on memory computing and has more performance than Hadoop, and is a rare all-around player, starting with multiple iterations, eclectic data warehousing, streaming, and graph computing paradigms. Spark is now the top open source project for the Apache Foundation, with huge community support (the number of active developers has surpassed Hadoop MapReduce) and technology is maturing.

As a core technology for the next generation of cloud computing and big data, Spark is the only alternative to revolutionary Hadoop, capable of doing everything Hadoop does, at a speed of more than 100 times times faster than Hadoop. Even in the field of offline data statistical analysis, which Hadoop excels at, Spark is at least a geometric progression faster than Hadoop; Spark another irreplaceable advantage is: "One stack to rule them all", spark adopts a unified technology stack to solve all the core problems of cloud computing, such as stream processing, graph technology, machine learning, NoSQL query, and perfect ecosystem This directly lays the dominant position of its unified cloud computing big Data domain;

Application status and future development of spark

At present, Spark has built its own large data processing ecosystem, such as stream processing, graph technology, machine learning, NoSQL query and so on have their own technology, and is the Apache top project, It can be expected that there will be explosive growth in community and commercial applications in the second half of 2014 to 2015.

Some large foreign internet companies have already deployed spark. Even the early major contributor to Hadoop, Yahoo, is now deploying spark in several projects; domestic taobao, Youku potatoes, NetEase, Baidu, Tencent, etc. have used spark technology for their own commercial production systems, Applications at home and abroad have become more and more widespread.

Some time ago, Mahout announced a significant message that the Mahout community said that from now on, they will no longer accept any algorithms implemented in mapreduce form, but they will still maintain the mapreduce implementation of those common algorithms. On the other hand, Mahout announces that new algorithms will be implemented based on Spark, who believe that spark's richer programming model and better performance will have a vital role in Mahout. On the other hand, Cloudera's machine learning framework oryx the execution engine for Spark, which Oryx also uses mapreduce. There are indications that spark has begun various massacres and is very hopeful of becoming the de facto standard for a new generation of distributed machine learning. Let's wait and see. Spark is gradually maturing and playing a more important role in this field.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.