IntroductionUsing bulkload to load data on HDFS into hbase is a common entry-level hbase skill. Below is a simple record of key steps. For more information about bulkload, see the official documentation.
Process
- Step 1: run on each machine
Ln-S $ hbase_home/CONF/hbase-site.xml $ hadoop_home/etc/hadoop/hbase-site.xml
- Step 2: Edit $ hadoop_home/etc/hadoop/hadoop-env.sh and copy to all nodes
Add at the end:
Export hadoop_classpath = $ hadoop_classpath: $ hbase_home/lib/*: $ zookeeper_home/zookeeper-3.4.6.jar
- Step 3: generate an hfile
Hadoop jar $ hbase_home/lib/hbase-server-0.98.6-cdh5.2.0.jar importtsv-dimporttsv. Columns =$ {hbase_columns}-dimporttsv. Bulk. Output =$ {hfile_path }$ {hbase_table} $ {source_data_path}
Note:
- $ {Hbase_columns} lists each column imported to hbase in sequence, in the format of [column family: qualifier ]. The column order must match the field order in the $ {source_data_path} data. Use hbase_row_key as the rowkey field, for example, "hbase_row_key, service_info: ID, service_info: rrank, service_info: service_code"
- Bulkload automatically creates $ {hfile_path}. You only need to specify this parameter without creating it in advance;
- $ {Hbase_table} can contain namespace, for example, "jilin_sme_sp_recs: sp_t_re_gul_service"
- $ {Source_data_path} Here we use a hive External table data directory.
- Step 4: import data to hbase
Hadoop jar $ hbase_home/lib/hbase-server-0.98.6-cdh5.2.0.jar completebulkload $ {hfile_path }$ {hbase_table}
[Ganzhou] imports data on HDFS into hbase through bulk Load