Phoenix uses MapReduce to load large volumes of data

Source: Internet
Author: User

1. Description

In real-world scenarios there can be some format of more structured data files that need to be imported into Hbase,phoenix to provide two ways to load CSV formatted files in Phoenix's data sheet. One is the way to load small batches of data using a single-threaded psql tool, one that uses mapreduce jobs to handle large quantities of data. The first way is relatively simple here is not introduced, want to know can refer to the official documents.

Http://phoenix.apache.org/bulk_dataload.html

2. Create a table

Create a user table in the CLI interface of Phoenix.

table user (id varchar primary key,account varchar ,passwd varchar);
    • 1

3. Add test Data

Create the Data_import.txt in the "phoenix_home" directory, as follows:

001,google,am
002,baidu,bj
003,alibaba,hz

4. Execution of MapReduce

Perform Mr Jobs in the "Phoenix_home" directory (the use of commands is related to the version of PHOENIX).

$ hadoop_classpath=/usr/Local/cdh-5.2.0/hbase-0.98.6/lib/hbase-protocol-0.98.6-cdh5.2.0.jar:/usr/local/cdh-5.2.0/ Hbase-0.98.6/conf Hadoop jar Phoenix -4.2.2 -client.jar org.apache< Span class= "hljs-built_in" >.phoenix.mapreduce-t user -i file: ///usr/local/cdh-5.2.0/phoenix-4.2.2/data_import.txt-z 192.168.187.128,192.168.187.129,192.168.187.130:2181           
    • 1

The parameters mean the following table:

High-energy Warning: Hbase-protocol-0.98.6-cdh5.2.0.jar This jar package is related to the version of HBase, and if it is a different version of HBase, replace it yourself.

The path to the data file reads the HDFs file by default, forcing the addition of the prefix file:/// specified as a local file path, but the process of Mr Execution is still an error saying that the file path could not be found:

Error:java.io.FileNotFoundException:File File:/usr/local/cdh-5.2.0/phoenix-4.2.2/data_import.txt does not exist
At Org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus (rawlocalfilesystem.java:524)
At Org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal (rawlocalfilesystem.java:737)
At Org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus (rawlocalfilesystem.java:514)
At Org.apache.hadoop.fs.FilterFileSystem.getFileStatus (filterfilesystem.java:398)
At Org.apache.hadoop.fs.checksumfilesystem$checksumfsinputchecker. (checksumfilesystem.java:140)

But the final data was successfully loaded into Phoenix.

Finally, the test data data_import.txt placed in the/phoenix/test/directory of HDFs, with the following command execution without any error.

al/cdh-5.2.0/hbase-0.98 .6/conf Hadoop jar phoenix-4.2.2- Client.jar org.apache .phoenix.mapreduce .txt- z 192.168.187 .128,192.168.187 .129,192.168.187 .130:2181         
    • 1

The job will be submitted to yarn by ResourceManager for resource allocation

Phoenix uses MapReduce to load large volumes of data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.