Phoenix Export CSV file

Source: Internet
Author: User


1. Is there an efficient CSV export tool?


Phoenix provides Bulkload tools that enable users to efficiently import data from large data volumes into HBase via Phoenix, so does Phoenix also have a tool class for efficiently exporting CSV data?



Some people here might wonder if they can export the data according to the usual HBase method. For example, write your own Java code, or use HBase Native-supported tool classes, or use the HBase tool class provided by Pig. Whether you can do this depends on the data type of the field when you build the table in Phoenix. If the field does not use a varchar, char type, and unsigned_* type, or if your table is salted table, then exporting from Phoenix will inevitably result in incorrect data being exported. The reason is that Phoenix handles most data type data byte formatting in a way that is different from native hbase. For example, Phoenix Salted table inserts a hash value in the first byte of Rowkey to distribute the data evenly across each region, so exporting with the regular HBase export tool is bound to cause the export of the Rowkey to be incorrect.


2. Pig loader--the best and only Phoenix export CSV file tool


Fortunately, Phoenix officially does provide an efficient export tool class, but it has to rely on pig. And in the test process, it is found that only the only tool can perfectly support the export of Phoenix table data .


The introduction and use of pig is not the focus of this article, friends who have not contacted please Baidu or Google search by themselves.


The introduction of the Phoenix Integrated Pig can be viewed in the following website links:



Https://phoenix.apache.org/pig_integration.html



It mentions two tool methods, one for importing large amounts of data, similar to the Bulkload tool, and another tool method for exporting massive amounts of data. Here we focus on the export of data.



The export tool claims to be Pig Loader. According to the official website of the introduction:


A Pig Data Loader allows users to the read data from Phoenix backed HBase tables within a pig script.


It means that we can write the pig script and use the tool class provided by Phoenix-pig (Phoenix Inherits Pig's module) to run the pig script to achieve the export of massive amounts of data.



There are two forms of Pig loader for exporting:


2.1 Export using Table


The first is to export the entire table data by specifying the HBase table name, for example, if I want to export all the records of the test table, you can use the following script command:


load ‘hbase://table/USER‘ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘${zookeeper.quorum}‘);

 

${zookeeper.quorum} needs to be replaced with zookeeper cluster machine IP plus port, eg:master,slave1,slave2:2181


Of course we can also precisely control which columns of the table are exported:


load ‘hbase://table/USER/ID,NAME‘ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘${zookeeper.quorum}‘);

 


The above script indicates that all records of the test table are exported, but only the ID column and the Name column are included.


2.2 Export using Query


The other is to control the exported data by specifying a query statement:


load ‘hbase://query/SELECT ID,NAME FROM USER WHERE AGE > 50‘ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘${zookeeper.quorum}‘);

 


Note



There are significant restrictions on the way to export using the query statement, such as the inability to specify group by, limit, ORDER by, DISTINCT, and the use of aggregate functions such as count,sum.


3. Using the example


Here we introduce the use of the two export methods through two complete usage examples. Example1 shows the export that specifies the table way, and Example2 demonstrates the export that specifies the query mode.


3.1 Example1






vi example1.pig

REGISTER /data/phoenix-default/phoenix-4.6.0-HBase-1.0-client.jar;
rows = load ‘hbase://table/USER‘ USING org.apache.phoenix.pig.PhoenixHBaseLoader(‘master,slave1,slave2:2181‘);
STORE rows INTO ‘USER.csv‘ USING PigStorage(‘,‘);


Execute shell command:





pig -x mapreduce example1.pig
3.2 Example2






vi example2.pig

REGISTER /data/phoenix-default/phoenix-4.6.0-HBase-1.0-client.jar;
rows = load ‘hbase://query/SELECT ID,NAME FROM USER‘ USING org.apache.phoenix.pig.PhoenixHBaseLoader(‘master,slave1,slave2:2181‘);
STORE rows INTO ‘USER.csv‘ USING PigStorage(‘,‘);


Execute shell command:





pig -x mapreduce example2.pig


Phoenix Export CSV file


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.