Phoenix Export CSV file

Last Update:2016-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Is there an efficient CSV export tool?

Phoenix provides Bulkload tools that enable users to efficiently import data from large data volumes into HBase via Phoenix, so does Phoenix also have a tool class for efficiently exporting CSV data?

Some people here might wonder if they can export the data according to the usual HBase method. For example, write your own Java code, or use HBase Native-supported tool classes, or use the HBase tool class provided by Pig. Whether you can do this depends on the data type of the field when you build the table in Phoenix. If the field does not use a varchar, char type, and unsigned_* type, or if your table is salted table, then exporting from Phoenix will inevitably result in incorrect data being exported. The reason is that Phoenix handles most data type data byte formatting in a way that is different from native hbase. For example, Phoenix Salted table inserts a hash value in the first byte of Rowkey to distribute the data evenly across each region, so exporting with the regular HBase export tool is bound to cause the export of the Rowkey to be incorrect.

2. Pig loader--the best and only Phoenix export CSV file tool

Fortunately, Phoenix officially does provide an efficient export tool class, but it has to rely on pig. And in the test process, it is found that only the only tool can perfectly support the export of Phoenix table data .

The introduction and use of pig is not the focus of this article, friends who have not contacted please Baidu or Google search by themselves.

The introduction of the Phoenix Integrated Pig can be viewed in the following website links:

Https://phoenix.apache.org/pig_integration.html

It mentions two tool methods, one for importing large amounts of data, similar to the Bulkload tool, and another tool method for exporting massive amounts of data. Here we focus on the export of data.

The export tool claims to be Pig Loader. According to the official website of the introduction:

A Pig Data Loader allows users to the read data from Phoenix backed HBase tables within a pig script.

It means that we can write the pig script and use the tool class provided by Phoenix-pig (Phoenix Inherits Pig's module) to run the pig script to achieve the export of massive amounts of data.

There are two forms of Pig loader for exporting:

2.1 Export using Table

The first is to export the entire table data by specifying the HBase table name, for example, if I want to export all the records of the test table, you can use the following script command:

load ‘hbase://table/USER‘ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘${zookeeper.quorum}‘);

${zookeeper.quorum} needs to be replaced with zookeeper cluster machine IP plus port, eg:master,slave1,slave2:2181

Of course we can also precisely control which columns of the table are exported:

load ‘hbase://table/USER/ID,NAME‘ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘${zookeeper.quorum}‘);

The above script indicates that all records of the test table are exported, but only the ID column and the Name column are included.

2.2 Export using Query

The other is to control the exported data by specifying a query statement:

load ‘hbase://query/SELECT ID,NAME FROM USER WHERE AGE > 50‘ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘${zookeeper.quorum}‘);

Note

There are significant restrictions on the way to export using the query statement, such as the inability to specify group by, limit, ORDER by, DISTINCT, and the use of aggregate functions such as count,sum.

3. Using the example

Here we introduce the use of the two export methods through two complete usage examples. Example1 shows the export that specifies the table way, and Example2 demonstrates the export that specifies the query mode.

3.1 Example1







vi example1.pig

REGISTER /data/phoenix-default/phoenix-4.6.0-HBase-1.0-client.jar;
rows = load ‘hbase://table/USER‘ USING org.apache.phoenix.pig.PhoenixHBaseLoader(‘master,slave1,slave2:2181‘);
STORE rows INTO ‘USER.csv‘ USING PigStorage(‘,‘);

Execute shell command:




pig -x mapreduce example1.pig

3.2 Example2







vi example2.pig

REGISTER /data/phoenix-default/phoenix-4.6.0-HBase-1.0-client.jar;
rows = load ‘hbase://query/SELECT ID,NAME FROM USER‘ USING org.apache.phoenix.pig.PhoenixHBaseLoader(‘master,slave1,slave2:2181‘);
STORE rows INTO ‘USER.csv‘ USING PigStorage(‘,‘);

Execute shell command:




pig -x mapreduce example2.pig

Phoenix Export CSV file

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Phoenix Export CSV file

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Phoenix Export CSV file

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support