Hortonworks improved spark and Hadoop comprehensive integration
Source: Internet
Author: User
KeywordsTo integrate use and devote to
Http://www.aliyun.com/zixun/aggregation/14112.html ">hortonworks's new code improved integration of Spark and Hive, and plan for security and performance upgrades to the Spark memory analysis platform.
The Apache Spark Memory analysis platform is now a hot technology in the field of large data analysis, and the Hadoop publisher Hortonworks recently decided to increase its commitment to spark. This Wednesday Hortonworks announced that its spark software will enhance hive integration and increase support for Spark's ORC data format, Hortonworks also plans to upgrade spark security and performance through the Yarn Resource management tool for collaboration with Hadoop.
Hortonworks's product marketing director, Jim Walker, said in a media interview that Hortonworks's goal was to ensure spark support yarn and optimize for yarn, with appropriate security and operational tools.
YARN is a data operating system within Hadoop that combines multiple datasets on a cluster and accesses them using a variety of processing engines.
And Spark is a databricks development and promotion of a memory (as) machine learning platform (with the help of Mllib). Spark also supports SQL analysis (with sparksql), streaming analysis (with spark streaming) and is expected to support popular R analysis libraries and profiling (SPARKR and Graphx, respectively). Spark can be run as a separate distributed cluster, run on Hadoop and Cassandra, and dock with MongoDB and traditional relational database resources.
Hortonworks is committed to making spark better able to collaborate with Hadoop, thus contributing a lot of code to spark and promoting integration with hive--, a Hortonworks preferred source of native Hadoop open-source SQL query tools. It is noteworthy that Hortonworks's competitors are also using hive, but Cloudera and MAPR exceptions, Cloudera are committed to promoting their own proprietary Impala SQL query engine, MAPR support the open source Apache Drill.
According to Hortonworks, the new code, which will provide previews at its official website this week, can improve the performance of spark read and write hive data, including support for writing data in Orc (optimized Row columnar) format, Orc is a kind of column storage thing, Optimized for data reading and compression performance, and Orc is becoming the de facto storage format for hive.
Hortonworks is one of 10 vendors to release spark software, and other vendors include Bluedata, Cloudera, DataStax, Guavus, IBM, Oracle, Pivotal, SAP, and Stratio. Of these, three manufacturers Cloudera, DataStax and MAPR are databricks certification service providers. Hortonworks's certification has not yet come down, but it has begun to support the release of Spark software.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.