Apache Phoenix Deployment and testing

Source: Internet
Author: User
Tags sql client zookeeper how to use sql port number

Apache Phoenix is an open source SQL engine for hbase. You can use the standard JDBCAPI instead of the HBase client API to create tables, insert data, and query your hbase data.

For your better and faster understanding of Apache Phoenix, the official gives a quick 15-minute description of the documentation for Apache Phoenix: http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html

I have also had a friend to translate this content, thank lnho2015 Blogger, I modified and added some content on the basis of translation.

1. Do not add an extra layer between my program and HBase, it will only slow down the speed.

In fact, No. Phoenix achieves the same or perhaps better performance (not to mention writing a lot of code) in the way that you write your own handwriting in the following ways:

* Compile your SQL query as native HBase's scan statement

* Detect the best start and end of the scan statement key

* Carefully organize your scan statements so that they execute in parallel

* Move data to the location of the data instead of moving it

* Push your WHERE clause's predicate to the server-side filter processing

* Perform aggregate queries via service-side hooks (called co-processors)

In addition to this, we have also made some interesting enhancements to better optimize performance:

* implemented a level two index to improve the performance of non-primary key field queries

* Statistics-related data to improve parallelization and help select optimal optimization scenarios

* Skip Scan Filter to optimize In,like,or query

* Optional scatter-line key to evenly distribute write pressure

2. Well, it's fast. But why use SQL. This is something of the 70 's.

One idea is: give the guys something they already know. What is the better way to motivate them to use hbase? The best way to do this is to use JDBC and SQL for the following reasons:

* Reduce the number of code users need to write

* Make performance optimization transparent to users

* Easy to use and integrate a large number of existing tools

3. But how can I get SQL to support some of my favorite hbase technologies

Don't take this (after using Phoenix) as the last time you see hbase. SQL is just a way of expressing the functionality you want to implement, and you don't have to think about how to use SQL to implement functionality. See if you can support the special usage of hbase that you like, for the Phoenix feature that is already in existence or is being done. You have your own ideas. We would love to hear your thoughts: write down the questions and we can also join our mailing list.

Having said so much, I just want to know how to get started.

Very good. Just follow our installation Guide (the version number of my HBase cluster environment is: hbase-1.1.5):

* Download and unzip our installation package (APACHE-PHOENIX-4.8.0-HBASE-1.1-BIN.TAR.GZ)

TAR-ZXVF apache-phoenix-4.8.0-hbase-1.1-bin.tar.gz

* Copy the service-side jar package of Phoenix that is compatible with your HBase installation to the Lib directory of each cluster node

CP phoenix-4.8.0-hbase-1.1-server.jar/var/lib/kylin/hbase-1.1.5/lib/

Then copy the following to the Lib directory of each hbase node of the cluster:

SCP Phoenix-4.8.0-hbase-1.1-server.jar Kylin@szb-l0023776:/var/lib/kylin/hbase/lib

SCP Phoenix-4.8.0-hbase-1.1-server.jar Kylin@szb-l0023777:/var/lib/kylin/hbase/lib

SCP Phoenix-4.8.0-hbase-1.1-server.jar Kylin@szb-l0023778:/var/lib/kylin/hbase/lib

SCP Phoenix-4.8.0-hbase-1.1-server.jar Kylin@szb-l0023779:/var/lib/kylin/hbase/lib

* Restart your cluster node

stop-hbase.sh

start-hbase.sh

* Add Phoenix Client jar package to your hbase client under Classpath

* Download and set squirrel as your SQL client, so you can initiate an instant query of SQL statements to manipulate your hbase cluster

4. I don't want to download and install anything else.

Well, it makes sense. You can create your own SQL scripts and use our command-line tools to execute them (instead of the previously mentioned scenario of downloading the installation software). Now let's look at an example.

Navigate to the Bin directory where you installed the Phoenix path to start.

4.1 First, let's create a Us_population.sql file that contains a statement with the following table

CREATE TABLE IF not EXISTS us_population (

State CHAR (2) is not NULL,

City VARCHAR is not NULL,

Population BIGINT

CONSTRAINT my_pk PRIMARY KEY (state, city)

);

4.2 Create a Us_population.csv file that contains the data for the table

Ny,new york,8143197

Ca,los angeles,3844829

il,chicago,2842518

tx,houston,2016582

pa,philadelphia,1463281

az,phoenix,1461575

Tx,san antonio,1256509

Ca,san diego,1255540

tx,dallas,1213825

Ca,san jose,912332

4.3 Finally we create a us_population_queries.sql file that contains a query SQL

Select state as ' state ', Count (city) as ' City Count ', sum (population) as "population sum"

From Us_population

GROUP by State

ORDER by sum (population) DESC;

4.4 Performing specific actions from the command line

Note: I specify the full connection to hbase on zookeeper, including the IP address, port number, and znodeparent, which, if Znode parent is not specified, is the/hbase node by default.

[kylin@szb-l0023780bin]$./psql.py szb-l0023780:2181:/hbase114 US_population.sql us_ Population.csv Us_population_queries.sql

No row supserted

Time:2.845sec (s)

CSV columns from database.

CSV Upsert complete. Rows upserted

Time:0.129sec (s)

St City Count Population Sum

-- --------------  -------------------------   -------------------------

NY 1 8143197

CA 3 6012701

TX 3 4486916

IL 1 2842518

PA 1 1463281

AZ 1 1461575

Time:0.077sec (s)

You can see that you have created your first Phoenix table, inserted the data in, and executed an aggregate query SQL code with only a few rows of data.

4.5 Performance Test Script performance.py

[kylin@szb-l0023780 bin]$./performance.py

Performance script Arguments notspecified. Usage:performance.sh <zookeeper> <row count>

Example:performance.sh localhost100000

We test 10 million data to see the following performance:

During execution, a hbase table named performance_10000000 is created and 10 million records are inserted, and then some query operations are performed.

[kylin@szb-l0023780 bin]$./performance.py szb-l0023780:2181:/hbase114 10000000

Phoenix Performance Evaluation Script 1.0

-----------------------------------------

Creating Performance Table ...

No rows upserted

time:2.343 sec (s)

Query # 1-count-select Count (1) from performance_10000000;

Query # 2-group by First pk-select host from performance_10000000 Group by HOST;

Query # 3-group by Second pk-select domain from performance_10000000 Group by DOMAIN;

Query # 4-truncate + group By-select TRUNC (date, "Day") Day from performance_10000000 Group by TRUNC (date, "Day");

Query # 5-filter + count-select Count (1) from performance_10000000 WHERE core<10;

Generating and upserting data ...

CSV columns from database.

CSV Upsert complete. 10000000 rows upserted

time:565.593 sec (s)

COUNT (1)

----------------------------------------

10000000

time:8.206 sec (s)

HO

--

Cs

EU

NA

time:0.416 sec (s)

DOMAIN

----------------------------------------

Apple.com

google.com

Salesforce.com

time:13.134 sec (s)

Day

-----------------------

2016-08-30 00:00:00.000

2016-08-31 00:00:00.000

2016-09-01 00:00:00.000

2016-09-02 00:00:00.000

2016-09-03 00:00:00.000

2016-09-04 00:00:00.000

......

2016-12-18 00:00:00.000

2016-12-19 00:00:00.000

2016-12-20 00:00:00.000

2016-12-21 00:00:00.000

2016-12-22 00:00:00.000

2016-12-23 00:00:00.000

2016-12-24 00:00:00.000

time:12.852 sec (s)

COUNT (1)

----------------------------------------

200745

time:11.01 sec (s)

If you want to count the total number of rows in hbase tables, do not use the Count command of HBase to count, single-threaded, poor performance, but use MapReduce to calculate, as follows:

HBase org.apache.hadoop.hbase.mapreduce.RowCounter ' performance_10000000 '

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.