MAPR today updated its Hadoop release, adding Apache Drill 0.5 to reduce the heavy data engineering effort.
Drill is an open source distributed ANSI query engine, used primarily for self-service data analysis. This is the open source version of Google's Dremel system, which is used primarily for interactive querying of large datasets-which support its bigquery servers. The goal of the Apache Drill project is to enable it to scale to 10,000 servers or more servers, while processing petabytes of data and trillions of records in seconds.
The drill query engine can implement the following features:
· Analyze data in its original format, including parquet, JSON files, and HBase tables, without the intervention of the database administrator (DBA).
· Analyze changing semi-structured/nested data from NoSQL data stores such as MongoDB and online rest APIs.
· Create queries that combine different Hadoop data sources, such as files, hbase tables, and hive tables.
· Reuse existing SQL skill sets, bi tools, and Apache hive deployments
"We're very excited about it because it opens up a new era for sql-on-hadoop," says Jack Norris, MapR's chief marketing officer, "focusing on the self-help data analysis of Hadoop without the involvement of the IT department." ”
Because drill supports running SQL queries in a variety of formats, it can be used to analyze real-time data without having to spend weeks preparing and managing patterns and setting up ETL tasks. In this way, it can provide instantaneous, self-service data analysis across multiple data sources.
"Businesses want users with existing SQL analytics skills to be able to access data stored in Hadoop and NoSQL databases," said Matt Aslett, director of data platforms and analytics for the 451 research firm. "Apache drill can provide access to data in Hadoop, Without the need of centralized mode (+ this station micro-letter Networkworldweixin), also do not need to have complex structure of NoSQL dataset. ”
"Each other Sql-on-hadoop solution relies on a fixed pattern, either hive or Tez," Norris adds, "Whether you're talking about mapreduce, hive, or some other sql-on-hadoop solution, we all need to do this modeling, Data transformations and pipelines to support analysis. Drill can discover data without waiting, providing you with the advantage of speed and flexibility. ”
MAPR is encapsulating drill and MapR 4.0.1, also released today. The new version of the Hadoop release extends the fact-time functionality for use cases, including operational applications, interactive queries, and streaming processing.
This new version includes multiple batch frameworks, including MapReduce 1.x and 2.x (based on yarn), and Spark (0.9 and 1.0.2). It also supports 5 Sql-on-hadoop technologies: Hive (0.11, 0.12, 0.13), Drill (0.5), Sparksql (1.0.2), Impala (1.3.1), and authentication integration with HP Vertica. It also supports HBase (0.94.21, 0.98.4) and mapr-db NoSQL Technologies, as well as three machine learning and graphics libraries, in the form of mahout (0.8, 0.9), Mllib (0.9, 1.0.2) and Graphx
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.