At the recently concluded Hadoop Europe Summit, Hortonworks announced version 2.1 of the Hortonworks Data Platform (HDP). The new version of the Hadoop distribution includes new enterprise features such as data governance, security, streaming and search, and takes the Stinger Initiative tool for interactive SQL queries to a whole new level.
Jim Walker, director of product marketing at Hortonworks, said: "To make Hadoop truly an enterprise data platform, they have to meet certain needs, and there is a very clear need for that practitioner: data governance, data access, data management , Security and operations.HDP 2.1 brings these together to make it an enterprise-class Hadoop. "
HDP 2.1 is the latest stable release of the Apache open source project. The new release provides Apache Hive 0.13 for Hadoop interactive SQL queries. Apache Hive 0.13 is the latest effort by the Stinger Initiative community to deliver bit-level interactive SQL queries in Hadoop. Over the past 13 months, the Apache Hive community has been focused on innovation, with 145 developers from 45 companies including Microsoft, Teradata and SAP adding more than 390,000 lines of code to Hive.
Walker said that with Apache Hive 0.13, Hive has seen a 100x improvement in SQL query performance, allowing interactive queries to hit the beat byte level. In addition to having a wide range of complex queries and connectivity capabilities, Hive is also able to extend the range of SQL semantics for analytic applications on Hadoop.
For data governance and security, HDP 2.1 integrates with Apache Falcon and Apache Knox. Among them, Falcon provides a data processing framework for managing and orchestrating Hadoop's internal and surrounding data flows. The framework provides a critical control framework for acquiring and processing data sets, copying and reserving data sets, redirecting data sets located in non-Hadoop extensions, and maintaining audit trails and lineages. Knox extends Hadoop's security boundaries and fully integrates with LDAP, Active Directory for Certificate Management and other frameworks, providing a common service for licensing cross-Hadoop and all related projects.
For data processing, the upgraded platform includes two new processing engines, Apache Storm and Apache Solr. Storm provides real-time event processing for sensor and business activity monitoring as a key component in creating a data lake architecture as it allows users to acquire millions of events per second, enabling fast, byte-level data capture Inquire.
At the same time, Solr has also integrated with HDP through deep technical collaboration with LucidWorks. The integrated Solr provides open source enterprise search, efficient indexing and sub-second search for billions of documents. In addition, as a framework for configuring, managing, and monitoring Apache Hadoop clusters, Apache Ambari was upgraded to version 1.5.1 in HDP 2.1 with support for the new data access engine as well as stack extensions, pluggable views, Restart and maintenance mode and other new features.
Currently HDP 2.1 has provided a technical preview, the official version is expected to be released at the end of April 2014 at 3721.html.