The Apache Spark is a memory data processing framework that has now been upgraded to a Apche top-level project, which helps to improve spark stability and replace mapreduce status in the next generation of large data applications.
Spark has recently been very strong, replacing the mapreduce trend. This Tuesday, the Apache Software Foundation announced Spark upgraded to a top-level project.
Due to the mapreduce and ease of use of performance and speed, spark currently has a large user and contributor community. This means that spark more in line with the next generation of low latency, real-time processing, iterative computing large data applications requirements.
Spark, the founder of the University of California, Berkeley, has now started a company called Databricks to promote the commercialization of spark.
Technically, Spark is a stand-alone project, but designed to work with the Hadoop Distributed File System (HDFS), which can be run directly on HDFS, SIMR users can mapreduce the cluster without administrator privileges and installation, and benefits from yarn ( Next Generation Hadoop Resource planner and resource Manager, Spark can now run on the same cluster as MapReduce. The Hadoop enterprise application Pioneer Cloudera has started offering spark enterprise application support to customers.
Although many new projects (such as Hortonworks's Stinger) adopt different processing frameworks, mapreduce and spark lack many tools (such as pig and casading), and for certain batch tasks, MapReduce is still a good choice. As Cloudera co-founder Mike Olson points out: MapReduce has a lot of legacy workloads that won't be transferred for a short time, even if spark.