Phoenix Usage Guide

Source: Internet
Author: User
Tags commit current time regular expression split table name time and date
1 Usage Overview

Phoenix is an hbase-based SQL middleware product that is open source and hosted on GitHub by Salesforce.com Corporation. For developers who are familiar with relational databases, the data tables in HBase can be used like a relational database such as MySQL with Phoenix. It is important to note that it also provides a JDBC driver package for Java programs to access data. Leverage the underlying 2 environment configurations such as HBase coprocessor and filters when implemented

The HBase cluster needs to be installed first, with a version of 0.94.4 above, JDK version 1.6 or more. The Phoenix version used in this article is phoenix-2.0.1 (Http://phoenix-bin.github.com/client/phoenix-2.0.1-install.tar), so if there are any discrepancies or changes, please use the official documentation.

Installing Phoenix is very simple, remove the Phoenix-2.0.1.jar from the downloaded tar package, put it into all the RS node installation directory Lib in the HBase cluster, and remember to remove the old Phoenix version installed before, if installed. Then restart all RS node services. If you are using Java code access, use the client version that matches it, such as the phoenix-2.0.1-client.jar of this article.

For testing purposes, the downloaded package content can be placed on a node machine in the HBASE cluster, such as this article will be placed in the Hmaster machine vnode120 $hbase_home Phoenix, in order to verify the success of Phoenix installation, please enter the $hbase_home /phoenix/bin directory, and give all *.sh executable permissions, operate as follows:


SQLLINE.SH provides a command-line tool in which you can perform related actions such as building tables, querying, and so on. The script needs to be run with the ZK parameter, vnode121 to a ZK node. After connecting, the execution!tables lists all the tables built through Phoenix, where system.table This table will exist by default, which holds information such as data type mappings in the Phoenix implementation. When you can perform these actions, it means that the Phoenix functionality has been properly integrated into the existing HBase cluster, and you can then experience the amazing journey of executing SQL functions over HBase.

3 Detailed SQL features

1) Create: You can create a table or view. If the table name is not enclosed in double quotes, the default is to use its corresponding uppercase name, which is enclosed in a consistent value. If you create a table that already exists and is not created by the Phoenix create syntax, you can continue using Phoenix create to create a table with the same name, without affecting the existing table data, but will affect some of the unwanted values that exist through the Phoenix Select result value. When you create a view, you need the corresponding table and column family already exist, and you cannot update the data in the table. Build a table statement, you can also attach some hbase table, column family configuration options, such as versions, Max_filesize etc.

CreateTable IF not EXISTS my_table (id char (TEN) NOT null primary key, value integer);

CreateTable IF not EXISTS my_schema.my_table (id char (TEN) NOT null primary key,value integer) data_block_encoding= ' NONE ' , versions=10,max_filesize=20480;

2) Alter: You can add or remove a column or update a table property. The column that is removed, and the data on it is deleted. If the column is a primary key and cannot be removed, the data is unaffected if the column is removed as a view.

altertable my_table ADD dept_name varchar (50)

Altertable my_table DROP COLUMN parent_id

Altertable my_table SET Immutable_rows=true

3) Drop: Deletes the table or view. If you delete a table, the data in the table is also deleted, and if the view is not affected.

Droptable my_schema.my_table
DROP VIEW My_view

4) Upsert: UPDATE or insert data. If the data does not exist in the table, it is inserted or updated, so you can see that Phoenix does not have an INSERT or UPDATE command defined separately. The column list can be omitted, but the order of values inserted later needs to be consistent with the order in which the table schema is defined, or you can define which column values to insert in the order in which they correspond. A selective Upsert method is also currently supported, which inserts the result of another query into the table as a value. If Auto commit is turned on, it will be committed on the server, otherwise it will be cached to the client, waiting for the explicit submission of the batch upsert, by configuring "Phoenix.mutate.upsertBatchSize" To specify the size, the default 10000 rows/times.

Upsertinto TEST VALUES (' foo ', ' Bar ', 3);

Upsertinto TEST (name,id) VALUES (' foo ', 123)

5) Delete: Deletes the specified line. If Auto commit is turned on, the delete is performed directly on the server.

DELETE from TEST;

DELETE from TEST WHERE id=123;

6) Index: Level two. Create a Level two index on a table or view, and the current version only supports adding a Level two index on a table with the Immutable_rows property. The current implementation is to insert the index after the data row has been inserted. When an index is created, a table is created in HBase, with the table named the two-level index name, so you can also specify the table-related parameters for that index. You can also delete indexes and modify indexes.

CreateIndex my_idx on sales.opportunity (last_updated_date DESC)

Dropindex My_idx on Sales.opportunity

AlterIndex My_idx on Sales.opportunity DISABLE

7) Explain: Execution plan. Provides a very simple way to view the logical steps required to execute a given command. Each step of the situation is output represented by a single-line string. This can easily locate the performance bottleneck of the query, or whether the two-level index is in effect.

Explain select * FROM Test where age>0;

8) Other

Constraint defines primary KEY constraints, which are sorted by column in ascending order by default:

CONSTRAINTMY_PK PRIMARY KEY (host,created_date)

Options, such as the previously used immutable_rows=true, are allowed to be indexed by the table with this option set by default. Phoenix defaults to modifying hbase metadata to make it effective, for hcolumndescriptor and htabledescriptor related options. It is worth mentioning that the salt_buckets option, which presets a byte for each rowkey, distributes it over different RS to avoid writing hot spot Rs.

Immutable_rows=true

salt_buckets=10

Hint, you can reset the default query behavior. Typically used for SQL tuning. Currently supports SKIP_SCAN,RANGE_SCAN,NO_INTRA_REGION_PARALLELIZATION,NO_INDEX,INDEX5 of hint.

Select/*+no_index */* from test where age>0

When you build a table, build an index, or alter an operation, the name can be used. Delimited, if it is a table, the first part is the schema name (default null), and if it is column, the first part is the family name (default ' _0 '). It is not case-insensitive when enclosed in double quotation marks.

Select expressions can use * and <familyname>.* to indicate that all columns are selected, or specify that all columns under the column family are selected, noting that the familyname is case-sensitive and the column names are case insensitive.

Defining a split point for a table, you can use Preparedstatement.setbinary (int,byte[]) to provide a split of any byte.

Joins and subqueries are not currently supported, and you can define aliases for tables and columns by using as or directly following the alias with the real name.

Currently, queries support sorting, such as ascending, descending, and so on in a column. ORDER by NAME ASC Nullslast.

The supported relationship connectors have and and OR. If you use like, you can use wildcard characters _ (single character) and% (any character). If the queried string itself includes it, it needs to be escaped (\). Between is a closed interval [5,10]

Comparison operators have <>,<=,>=,=,<,>,!= where <> is equivalent to! =. String connections can be made +,-,*,/with a | |, number and date type.

4 Built-in functions

The supported aggregation functions are:

Øavg: Averaging, if NULL is not returned

Øsum:

Øcount: The number of rows, if a column is specified, the number of non-empty columns is returned, if * or 1, all rows are returned, plus distinct returns the number of rows that are not the same

Ømax: Ask for maximum value

Ømin: To find the minimum value

Øpercentile_cont: Specified.

Øpercentile_disc: Specify how much the column specific value is

Øpercent_rank: Specifies the percentage of the value, Percent_rank (Withingroup) (ORDER by ID ASC)

Østddev_samp: Sample Standard deviation

Østddev_pop: Overall standard deviation

Supported String Functions:

ØSUBSTR: Take substring, default is based on 1, if you want to based on 0, specify 0, if specified as negative, it is from the end of the string

Øtrim: Removing string kinsoku and trailing spaces

Øltrim: Remove left space of string

Ørtrim: Remove whitespace to the right of the string

Ølength: Return string length

ØREGEXP_SUBSTR: Gets a substring by specifying a regular expression

Øregexp_replace: Regular Replacement

Øupper: Uppercase Conversion

Ølower: Lowercase Conversion

Øreverse: String inversion

Øto_char: Formats a date, time, timestamp, or number as a string. The default date format is Yyyy-mm-dd HH:mm:ss, and the number format is #,# #0. # # #.

Supported time and date functions:

Øround: Rounding.

Øtrunc: Truncate

Øto_date: Converting to DATE type

Øcurrent_date: Returns the current date on RS

Øcurrent_time: Return current time on RS

Supported time and date functions:

Øto_number: Convert date, time, timestamp to a number, acceptable format string.

ØCOALESCE: Specifies the default value if the corresponding value is null

5 Data Types

Øinteger: The range is 2147483648 to 2147483647, mapped with Java.lang.Integer. Note, however, that the second binary indicates that it is necessary to flip the first sign bit, so that negative numbers are arranged in front of positive numbers.

Øunsigned_int: The range is 0 to 2147483647, which corresponds to the Java.lang.Integer, and the second binary representation and the Bytes.tobytes (INT) method produce the same.

Øbigint: The range is 9223372036854775808 to 9223372036854775807, corresponding to the Java.lang.Long. 8 bytes, which is also a sign bit reversal.

Øunsigned_long: The possible values are 0 to 9223372036854775807, corresponding to Bytes.tobytes (LONG).

øtinyint:-128 to 127. Corresponds to the java.lang.Byte. The sign bit also needs to be reversed.

øunsigned_tinyint:0 to 127. The binary representation is a single byte, corresponding to Bytes.tobytes (byte).

øsmallint:-32768 to 32767. Corresponds to the java.lang.Short. The sign bit needs to be reversed.

øunsigned_smallint:0 to 32767. A binary representation corresponds to Bytes.tobytes (short).

Øfloat: -3.402823466 e + 38 to 3.402823466 E + 38, corresponding to Java.lang.Float, the first byte needs to be reversed.

øunsigned_float:0 to 3.402823466 E + 38, the binary representation is consistent with bytes.tobytes (FLOAT).

Ødouble: The range is -1.7976931348623158 E + 308 to 1.7976931348623158 E + 308. Corresponding to the java.lang.Double, the first binary form must be reversed.

øunsigned_double:0 to 1.7976931348623158 E + 308. A binary representation corresponds to a bytes.tobytes (double).

Ødecimal: With fixed precision. The maximum accuracy is 18 bits, corresponding to the java.math.BigDecimal. The binary representation is in variable-length format. When used in Rowkey, a nullbyte is subsequently generated, unless it is the last column.

Øboolean: Binary form 0 means false,1 represents true

Øtime: The format is Yyyy-mm-dd Hh:mm:ss, with the date and time two parts. Corresponds to the java.sql.Time. Binary representation of a long type of 8 bytes, representing the number of milliseconds since the epoch

Ødate: The format is Yyyy-mm-dd Hh:mm:ss, with the date and time two parts. Corresponds to the Java.sql.DATE. The binary representation is a long of 8 bytes, representing the number of milliseconds since the epoch.

Øtimestamp: Format Yyyy-mm-dd hh:mm:ss[.nnnnnnnnn], with Java.sql.Timestamp, binary represented as 12 bytes, 8 bytes for long milliseconds, 4 bytes for int nanoseconds

Øvarchar: variable length string with maximum byte length (optional). When used with Rowkey, a null byte is added to the end, and if it happens to be in the last part of Rowkey, it is not added.

Øchar: fixed-length string. The binary representation is the UTF8 form that corresponds to the Bytes.tobytes (string).

Øbinary: Primitive fixed-length binary byte array, corresponding to byte[]

Øvarbinary: Primitive variable-length binary format byte array.

6 Reference pages

HBase official: https://hbase.apache.org/

Phoenix official: Https://github.com/forcedotcom/phoenix

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.