R language combined with Hadoop and hbase

Source: Internet
Author: User
Tags zookeeper

The installation and use of HBase and Rhbase are divided into 3 chapters.

1. Environment preparation and HBase installation 2. Rhbase installation 3. Rhbase Program Use Cases

Each chapter is divided into "Text Description section" and "code section" to maintain the coherence between text description and code.

Note: For the Hadoop environment and the RHADOOP environment, see the first two articles in the same series, which will no longer be covered.

1. Environment Preparation and HBase installationText Description section:

First environment preparation, here I chose the Linux Ubuntu operating system 12.04 64-bit version, you can use their own habits to choose the right Linux.

But the JDK must use Oracle Sun official version, please download from the official website, the operating system's own OPENJDK will have a variety of incompatible. JDK Please choose 1.6.x version, JDK1.7 version will also have various incompatibilities.
Http://www.oracle.com/technetwork/java/javase/downloads/index.html

For the installation of Hadoop environments, refer to the Rhadoop practice system "Hadoop environment build" article.

Hadoop and HBase versions: hadoop-1.0.3,hbase-0.94.2

Configure an environment variable for the start command of HBase, using the zookeeper that comes with HBase
Export Hbase_manages_zk=true

Configure Hbase-site.xml, set access directory, number of data replicas, zookeeper access port.

Copy the class library of the Hadoop environment, overwriting the class library in HBase.

Configuration complete to start the HBase service.

Code section:

HBase Installation

1) Download and install HBase

~ http://www.fayea.com/apache-mirror/hbase/hbase-0.94.2/hbase-0.94.2.tar.gz~ tar xvf hbase-0.94.2.tar.gz

2) Modify the configuration file

~ cd hbase-0.94.2/~ vi conf/hbase-env.sh     export java_ home=/root/toolkit/jdk1.6.0_29    export hbase_home=/root/hbase-0.94.2     export hadoop_install=/root/hadoop-1.0.3    export hbase_classpath=/ root/hadoop-1.0.3/conf    export hbase_manages_zk=true~ vi conf/ Hbase-site.xml    <configuration>      <property >        <name>hbase.rootdir</name>         <value>hdfs://master:9000/hbase</value>       </property>      <property>         <name>hbase.cluster.distributed</name>         <value>true</value>      </property>      <property>          <name>dfs.replication</name>          <value>1</value>      < /property>      <property>         <name>hbase.zookeeper.quorum</name>        <value >master</value>      </property>       <property>          <name> hbase.zookeeper.property.clientport</name>           <value>2181</value>      </property>       <property>        <name>hbase.zookeeper.property.datadir</name>         <value>/root/hadoop/hdata</value>       </property>    </configuration>

3) Copy the configuration file and class library of the Hadoop environment

~ CP ~/hadoop-1.0.3/conf/hdfs-site.xml ~/hbase-0.94.2/conf ~ CP ~/hadoop-1.0.3/hadoop-core-1.0.3.jar ~/HBASE-0.94.2/ LIB ~ cp ~/hadoop-1.0.3/lib/commons-configuration-1.6.jar ~/hbase-0.94.2/lib ~ CP ~/hadoop-1.0.3/lib/ Commons-collections-3.2.1.jar ~/hbase-0.94.2/lib

4) Start Hadoop and HBase

~/hadoop-1.0.3/bin/start-all.sh ~/hbase-0.94.2/bin/start-hbase.sh

5) View hbase for

~ JPS 12041 hmaster 12209 hregionserver 31734 tasktracker 31343 DataNode 31499 secondarynamenode 13328 J PS 31596 jobtracker 11916 hquorumpeer 31216 NameNode

6) Open the HBase command line client

~/hbase-0.94.2/bin/hbase shellhbase Shell; Enter ' help<return> ' for list of supported commands. Type "exit<return>" to leave the HBase shellversion 0.94.2, r1395367, Sun Oct 7 19:11:01 UTC 2012hbase (main): 001:0& Gt List TABLE 0 row (s) in 0.0150 seconds

HBase installation is complete.

2. Rhbase installationText Description section:

After Setup completes HBase, we also need to install thrift because Rhbase is calling hbase through thrift.

Thrift is required to be compiled locally, the official does not provide a binary installation package, first download thrift-0.8.0.

In the thrift Extract directory input./configure, the thrift is listed in the locale supported by the current machine, if only for rhbase default configuration.
In addition to supporting rhbase access in my configuration, php,python,c++ is supported. Therefore, some additional class libraries need to be installed in the system. According to their own requirements, you can set the thrift compilation parameters.

Compile and install thrift, and then start the HBase thriftserver service.

Finally, install Rhbase.

Code section:
  1. Download Thrift

    ~ wget http://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz~ tar xvf thrift-0.8.0.tar.gz~ cd thrift-0.8.0/
  2. Download PHP support class library (optional)

    ~ sudo apt-get Install php-cli
  3. Download C + + support class library (optional)

    ~ sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev automake Libtool Flex bis On Pkg-config g++ Libssl-dev
  4. To generate compiled configuration parameters

    ~ ./configure  thrift 0.8.0  building code generators .....  :  building c++ library ......... : yes  building c  ( GLIB)  library .... : no  building java library ........ :  no  Building C# Library .......... : no  Building  python library ...... : yes  building ruby library ........  : no  building haskell library ..... : no  building  perl library ........ : no  building php library ......... :  yes  building erlang library ...... : no  building go  Library .......... : no  Building TZlibTransport ...... :  yes  building&nbsp TNONBLOCKINGSERVER&NBSP, ....  : yes  Using Python ................. : /usr/bin/python   Using php-config ............. : /usr/bin/php-config
  5. Compiling and installing

    ~ make~ Make Install
  6. View Thrift Version

    ~ Thrift-version Thrift Version 0.8.0
  7. Thrift Server that starts HBase

    ~/hbase-0.94.2/bin/hbase-daemon.sh Start Thrift ~ JPS 12041 hmaster 12209 hregionserver 13222 thriftserv  Er 31734 tasktracker 31343 DataNode 31499 secondarynamenode 13328 Jps 31596 jobtracker 11916 Hquorumpeer 31216 NameNode
  8. Installing Rhbase

    ~ R CMD INSTALL rhbase_1.1.1.tar.gz

The installation is complete smoothly.

3.Related functions of Rhbase
Hb.compact.table hb.describe.table Hb.insert hb.regions.tablehb.defaults hb.get Hb.insert.data.frame hb.scanhb.delete hb.get.data.frame hb.list.tables hb.scan.exhb.delete.tab Le hb.init hb.new.table hb.set.table.mode

4. Basic operation Comparison of HBase and Rhbase
Build table Hbase:create ' Student_shell ', ' Info ' RHBASE:hb.new.table ("Student_rhbase", "info") List all tables Hbase:listrhbase:h B.list.tables () Display table structure Hbase:describe ' Student_shell ' RHBASE:hb.describe.table ("Student_rhbase") inserts a data hbase:put ' Student_shell ', ' Mary ', ' Info:age ', ' RHBASE:hb.insert ' ("Student_rhbase", List ("Mary", "Info:age", "24")))     Read Data hbase:get ' Student_shell ', ' Mary ' RHBASE:hb.get (' student_rhbase ', ' Mary ') delete the table (HBase requires two commands, Rhbase is only an operation) HBASE: Disable ' Student_shell ' hbase:drop ' Student_shell ' RHBASE:hb.delete.table (' student_rhbase ')

Code section:

Hbase Shell

> create  ' Student_shell ', ' info ' > list    TABLE     student_shell> describe  ' Student_shell '    DESCRIPTION                                                              ENABLED   {NAME =>  ' Student_shell ',  families => [{name  =>  ' info ', data_block_ true   encoding =>  ' NONE ',  bloomfilter =>  ' NONE ', replication_scope =>  ' 0 '    ,  versions =>  ' 3 ', compression =>  ' NONE ', min_versions =>  ' 0 ',  TTL   =>  ' 2147483647 ', keep_deleted_cells =>  ' false ', blocksize =>  ' 65536     ', in_memory =>  ' false ', encode_on_disk =>  ' true ',  blockcache  =>  ' t   rue '}]}>  put  ' Student_shell ', ' Mary ', ' Info:age ', ' 19 ' >  get  ' Student_shell ', ' Mary '   COLUMN                       cell  info :age                    timestamp=1365414964962, value=19> disable  ' Student_shell ' > drop  ' Student_shell '

Rhbase Script

~ r> library (Rhbase) > hb.init ()     <pointer: 0x16494a0>     attr (, "class")     [1]  "Hb.client.connection"  > Hb.new.table ("Student_rhbase", "info", Opts=list (maxversions=5,x=list (maxversions=1l,compression= ' GZ ', inmemory= TRUE))    [1] true> hb.list.tables ()      $student _rhbase       maxversions compression inmemory bloomfiltertype  bloomfiltervecsize    info:            5        NONE    FALSE             NONE                   0           bloomfilternbhashes blockcache timetolive    info:                    0      FALSE          -1 > hb.describe.table ("Student_rhbase")            maxversions compression inmemory  bloomfiltertype bloomfiltervecsize    info:            5        none    false             NONE                   0           bloomfilternbhashes blockcache timetolive    info:                    0       False         -1> hb.insert ("Student_rhbase", List (" Mary "," Info:age ", ")))     [1] true> hb.get (' Student_rhbase ', ' Mary ')     [[1]]    [[1]][[1]]    [1]  "Mary"      [[1]][[2]]    [1]  "Info:age"     [[1]][[3]]     [[1]][[3]][[1]]    [1]  "> hb.delete.table" (' Student _rhbase ')     [1] true

The fourth article of Rhadoop Practice series is complete! I hope this four article is helpful to everyone.
Later on, I might also write about the RMR algorithm practice, Rhadoop architecture aspects and the use of hive for related articles.
You are welcome to ask more questions and communicate more.


R language combined with Hadoop and hbase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.