Introducing IBM's SQL technology for Hadoop to relational DBMS users

Source: Internet
Author: User
Keywords Ibm hadoop dbms sql technology to relationship

This article will introduce big SQL, which answers many common questions about this IBM technology that users of relational DBMS have.

Large data: It is useful for IT professionals who analyze and manage information. But it's hard for some professionals to understand how to use large data, because Apache Hadoop, one of the most popular big data platforms, has brought a lot of new technology, including the newer query and scripting languages.

Big SQL is an IBM infosphere biginsights SQL interface based on the Hadoop platform. Big SQL is designed to make it easy for SQL developers to query for data that is managed by Hadoop. It enables data administrators to create new tables for data stored in Hive, HBase, or their biginsights Distributed file systems. In addition, the LOAD command allows administrators to populate the big SQL table with data from a variety of sources. And the JDBC and ODBC drivers for big SQL enable many existing tools to query distributed data using big SQL.

However, big SQL does not transform Hadoop into a large distributed relational database. If you want to know what big SQL can do, we'll explain its basics here, try to clarify some common misconceptions, and answer many of the questions about this new technology that users of relational DBMS often encounter.

Big SQL Overview

Big SQL is a software layer that enables IT professionals to create tables and query data in Biginsights using familiar SQL statements. To do this, programmers will use standard SQL syntax and, in some cases, use IBM-created SQL extensions to make it easy to leverage some of the Hadoop based technologies. We'll explain these topics in more detail later.

To give you an idea of what big SQL is, Figure 1 shows its architecture and how to integrate it into the Biginsights Enterprise Edition 2.1 platform.

Figure 1. Big SQL Schema

As shown at the top, big SQL supports JDBC and ODBC client access from the linux® and Windows® platforms. In addition, the big SQL LOAD command can be directly from a variety of relational DBMS systems (IBM puredata™systems for Analytics, db2® and Teradata supported by Netezza technology) and stored locally or biginsights Files in a Distributed file system read data. Biginsights EE 2.1 can be configured to support Hadoop Distributed File System (HDFS) or IBM's general Parallel file system with the file slide-up Optimizer (GPFS-FPO).

The SQL query engine supports connections, unions, groupings, common table expressions, window functions, and other familiar SQL expressions. In addition, you can change the data access policy by optimizing the hints and configuration options. Depending on the nature of the query, the amount of data, and other factors, big SQL can use the MapReduce framework of Hadoop to handle various query tasks in parallel, or to execute your query locally on a big SQL Server on a single node, whichever is best for your query.

Organizations interested in big SQL typically have a wealth of SQL skills in-house, as well as a suite of SQL based business intelligence applications and query/reporting tools. For organizations unfamiliar with Hadoop, the concept of being able to leverage existing skills and tools (and possibly reusing some existing applications) can be very appealing. Indeed, some companies with large data warehouses built on DBMS systems are looking for a Hadoop based platform that is used as a potential target for unloading "cold" or infrequently used data, while still supporting query access. In other cases, the organization relies on Hadoop to analyze and filter non-traditional data (such as logs, sensor data, social media posts, etc.), and ultimately provide a subset or collection of this information to their relational warehouses to augment their product, customer, or service views.

In these and other cases, big SQL may play an important role. However, it is not appropriate to think that big SQL will replace the relational DBMS technology. Big SQL is designed to complement the HADOOP based infrastructure and leverage it in biginsights. Some common features of relational DBMS systems do not exist in big SQL, and some of the big SQL features do not exist in most relational DBMS systems. For example, big SQL supports querying data, but does not support SQL UPDATE or DELETE statements. INSERT statements are only supported for HBase tables. The Big SQL table might contain columns with complex data types, such as struct and array, rather than a simple "flat" row. Some underlying storage mechanisms are also supported, including:

a Hive table stored in a HDFS or GPFS-FPO, such as a comma-delimited file, in a sequential file format, rcfile format, and so on. (Hive is the Data Warehouse implementation of Hadoop) HBase table (HBase is a key-value or column based data store for Hadoop)

Let's look at big SQL in more detail to get a better idea of its functionality.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.