Use Python to access HBase (Thrift module installation and testing)

Source: Internet
Author: User
Tags python script sqoop

Introduction to the Hadoop environment:

Master Service: Node1

Slave Server: Node2,node3,node4

MySQL server: node29

Thrift installed on the NODE1 server!


Related software versions:

Hadoop version: hadoop-0.20.2

Sqoop version: SQOOP-1.2.0-CDH3B4

Java version: jdk1.7.0_67

MySQL version: 5.1.65

Thrift Version: thrift-0.9.0

Thrift Installation Links: http://thrift.apache.org/download/

Python version: 2.7.3

ps:python2.5 Version Use Thrift problem



One: Pre-Test preparation work

1) First load the data in the MySQL database into hbase:

MySQL data is as follows:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/4D/0F/wKiom1RKCu7Acn6aAAEPRQIQRa8209.jpg "title=" MySQL condition. jpg "alt=" wkiom1rkcu7acn6aaaeprqiqra8209.jpg "/>


The command format for importing MySQL data into HBase is:

Sqoop import--connect jdbc:mysql://mysqlserver_ip/databasename--username--password Password--table DataTable-- Hbase-create-table--hbase-table hbase_tablename--column-family col_fam_name--hbase-row-key key_col_name


Description: DatabaseName and DataTable are MySQL database and table names, Hbase_tablename is the table name to be used in HBase, Key_col_name can specify which column in the DataTable is The rowkey,col_fam_name of the new table for HBase is the column family name for all columns except Rowkey


2) Load MySQL data (node29) into hbase on Node1:

Sqoop import--connect jdbc:mysql://172.16.41.29/sqoop--username sqoop--password Routon--table Students-- Hbase-create-table--hbase-table students--column-family stuinfo--hbase-row-key ID


Verify that the load succeeded in HBase:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4D/10/wKioL1RKDCHRujBiAAJMEeBRwoU194.jpg "title=" HBase situation. jpg "alt=" wkiol1rkdchrujbiaajmeebrwou194.jpg "/>


Two Thrift Software Installation

Python version: 2.7.3


The steps are:

1) Install python2.7.3

Description: python2.7.3 and thrift Combination No problem, python2.5 version seems to be no!



The syntax rhel5 in the generated hbase.py file does not support python2.4

Tar fvxj python-2.7.3.tar.bz2

./configure--prefix=/usr/local/python2.7

Make && make install

The python2.7.3 path is:

/usr/local/python2.7/bin/python

Modify the default Python version to 2.7

Set python2.7 to environment variable, system default Python version is 2.4

Rm-rf/usr/bin/python

Ln-s/usr/local/python2.7/bin/python/usr/bin/python


[Email protected] thrift-0.9.0]# python-v

Python 2.7.3


2) Install Thrift

Tar fvxz thrift-0.9.0.tar.gz

CD thrift-0.9.0

./configure

Make && make install


Thrift 0.9.0


Building C + + Library ...: no

Building C (GLib) Library ....: Yes

Building Java Library ...: No

Building C # Library ...: No

Building Python Library ...: Yes

Building Ruby Library ...: No

Building Haskell Library .....: No

Building Perl Library ...: No

Building PHP Library ....: No

Building Erlang Library ...: No

Building Go Library ....: No

Building D Library .....: No


Python Library:

Using Python ....:/usr/bin/python ......


Can see thrift support many languages, according to the current needs, support Python can!


View Thrift Version:

[Email protected] thrift-0.9.0]# thrift-version

Thrift version 0.9.0


3) Let thrift support hbase

Execute the following command:

Thrift--gen Py/usr/local/hbase-0.90.5/src/main/resources/org/apache/hadoop/hbase/thrift/hbase.thrift


A directory is created at the current time and the directory name is:

[email protected] ~]$ LL

Total 7056

-rw-rw-r--1 Hadoop hadoop 3045 Oct 13:55 access_log2.txt

-rw-r--r--1 Hadoop hadoop 7118627 Feb 1 access_log.txt

-rw-rw-r--1 Hadoop hadoop 3500 Oct 10:17 Derby.log

Drwxrwxr-x 3 Hadoop hadoop 4096 Oct 15:28 gen-py

-rw-rw-r--1 Hadoop hadoop 3551 Oct 11:21 pig_1413170429087.log


The GEN-PY directory structure is as follows:

[Email protected] ~]$ tree gen-py/

gen-py/

|--__init__.py

'--HBase

|--Hbase-remote

|--hbase.py

|--__init__.py

|--constants.py

'--ttypes.py


1 directory, 6 files



4) Copy the Gen-py directory to the Python related directory:

Cp-r gen-py/hbase//usr/local/python2.7/lib/python2.7/site-packages/.


5) Allow Python to import the Thrift module:


[Email protected] ~]# ln-s/usr/lib/python2.7/site-packages/thrift*/usr/local/python2.7/lib/python2.7/ Site-packages/.

[Email protected] ~]# ls-l/usr/local/python2.7/lib/python2.7/site-packages/

Total 12

Drwxr-xr-x 2 root root 4096 Oct 15:32 hbase

-rw-r--r--1 root root 119 Oct

lrwxrwxrwx 1 root root the Oct 15:50 thrift-/usr/lib/python2.7/site-packages/thrift

lrwxrwxrwx 1 root root the Oct 15:50 thrift-0.9.0-py2.7.egg-info-/usr/lib/python2.7/site-packages/thrift-0.9.0-p Y2.7.egg-info



6) Start the Thrift service:

HBase thrift-p 9090 Start


7) Write a Python script on Node1 to see which tables are in HBase:

#! /usr/bin/env python#coding=utf-8import sys#hbase.thrift the generated py file is placed here Sys.path.append ('/usr/ Local/lib/python2.7/site-packages/hbase ') From thrift import thriftfrom thrift.transport  import TSocketfrom thrift.transport import TTransportfrom thrift.protocol  import tbinaryprotocolfrom hbase import hbase#, such as columndescriptor , define from in Hbase.ttypes  hbase.ttypes import *# make socket# Here you can modify the address and Port Transport = tsocket.tsocket ( ' 172.16.41.26 ',  9090) # buffering is critical. raw sockets are very  slow#  can also use Tframedtransport, is also efficient transmission mode Transport = ttransport.tbufferedtransport (transport) #  wrap in a protocol# transport protocol and transfer process is separate, can support multi-protocol protocol =  Tbinaryprotocol.tbinaryprotocol (transport) #客户端代表一个用户client  = hbase.client (Protocol) # Open Connection Transport.open () #打印表名print (Client.gettablenames ())


Execute script:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4D/10/wKiom1RKEALwhL86AAEY9rF6T_g766.jpg "title=" Thrift.jpg "alt=" Wkiom1rkealwhl86aaey9rf6t_g766.jpg "/>


Here, Python can communicate with HBase via the thrift plugin!


This article is from the "Shine_forever blog" blog, make sure to keep this source http://shineforever.blog.51cto.com/1429204/1567640

Use Python to access HBase (Thrift module installation and testing)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.