Phoenix Frequently Asked Questions

Source: Internet
Author: User
Tags commit create index psql stmt versions zookeeper
Frequently Asked QuestionsI want to start there is no Phoenix Hello world. Phoenix There is no way to bulk load. How to map the Phoenix table to an existing hbase table. There is no hint to optimize the Phoenix. How to create a secondary index on a table. Why my secondary index is not being used. Phoenix how fast. Why so fast. How to connect to a secure hbase cluster. How to connect hbase running on the Hadoop-2. Phoenix can work on a table of any timestamp, as flexible as the HBase API. Why my query does not do a range scan. Should I assemble the Phoenix JDBC connection? Why Phoenix adds an empty or virtual keyvalue when making a upsert. I want to start there is no Phoenix Hello world.

Prerequisites: Download the newest Phoenix from here, will Phoenix-*. The jar is copied to the HBase Lib folder and the HBase is restarted.

1. Use the console to start sqlline:$ sqlline.py [zookeeper] when the Sqlline connection executes the following statement:

CREATE TABLE Test (MyKey integer NOT null primary key, MyColumn varchar);
Upsert into test values (1, ' Hello ');
Upsert into test values (2, ' world! ');
SELECT * from Test;

You should get the following output
+-------+------------+
| MYKEY |  MYCOLUMN  |
+-------+------------+
| 1     | Hello      |
| 2     | world!     |
+-------+------------+

2. Using Java

Create a Test.java file with the following content:

Import java.sql.Connection;
Import Java.sql.DriverManager;
Import Java.sql.ResultSet;
Import java.sql.SQLException;
Import java.sql.PreparedStatement;

Import java.sql.Statement;
		public class Test {public static void main (string[] args) throws SQLException {Statement stmt = null;
		
		ResultSet rset = null;
		Connection con = drivermanager.getconnection ("jdbc:phoenix:[zookeeper]");
		
		stmt = Con.createstatement ();
		Stmt.executeupdate ("CREATE TABLE Test (MyKey integer NOT null primary key, MyColumn varchar)");
		Stmt.executeupdate ("Upsert into test values (1, ' Hello ')");
		Stmt.executeupdate ("Upsert into test values (2, ' world! ')");
		
		Con.commit ();
		PreparedStatement statement = con.preparestatement ("SELECT * from Test");
		RSet = Statement.executequery ();
		while (Rset.next ()) {System.out.println (rset.getstring ("MyColumn"));
		} statement.close ();
	Con.close (); }
}

Compiling and executing at the command line

$ Javac Test.java

$ JAVA-CP ".. /phoenix-[version]-client.jar: "Test

You should get the following output

Hello world! Phoenix There is no way to bulk load.

Map Zoom out

Look at the example here

CSV

CSV data can be bulk loaded with the built-in utility named Psql. The typical upload rate is 20K to 50K rows per second (depending on the width of the row).

Examples of Use:
Use Psql $ psql.py [zookeeper]. /examples/web_stat.sql Creating a Table

Upsert CSV bulk Data $ psql.py [zookeeper].. /examples/web_stat.csv How to map the Phoenix table to an existing hbase table.

You can create a Phoenix table or view by creating a Table/create View DDL statement on a pre-existing hbase table. In both cases, we will keep the hbase metadata as is. For CREATE TABLE, we will create any metadata (table, column family) that does not exist. We will also add a null key value for each row so that the query behaves as expected (no need to project all columns during the scan).

Another caveat is that the way the bytes are serialized must match the way Phoenix is serialized. For Varchar,char and Unsigned_ * types, we use the HBase bytes method. The char type requires only single-byte characters, and the unsigned type expects to be greater than or equal to zero. For signed types (Tinyint,smallint,integer and bigint), Phoenix flips the first bit so that negative values are sorted before positive values. Because HBase arranges row keys in dictionary order, the first bit of a negative value is 1, and the positive value is 0, and if we don't flip the first bit, the negative value is greater than positive. Therefore, if you use the HBase native API to store integers and want to access them through Phoenix, make sure that all data types are of type unsigned.

Our composite row keys are formed by simply connecting the values together, using a 0-byte character after the variable length type as the delimiter.

If you create an hbase table like this:

Create ' T1 ', {NAME = ' F1 ', VERSIONS = 5}

So you have an hbase table named "T1" with a column family named "F1". Keep in mind that in hbase, you should not model the structure of a possible keyvalues or row key. This is the information you specified in Phoenix, which is beyond the table and column series.

So in Phoenix, you'll create a view like this:

CREATE VIEW "T1" (PK varchar PRIMARY KEY, "F1". Val varchar)

The PK column declares that your row key is varchar (that is, a string), and "F1". VAL column declares that your HBase table will contain key values with column family and column qualifier "F1": VAL, and their value will be varchar.

Note that if you create an hbase table with all uppercase names, you do not need double quotation marks (because this is how Phoenix uses the above framework to standardize the string). For example:

Create ' T1 ', {NAME = ' F1 ', VERSIONS = 5}

You can create this Phoenix view:

CREATE VIEW T1 (PK varchar PRIMARY key,f1.val varchar)

Or if you are creating a new hbase table, just let Phoenix do all the work for you (no hbase shell is required).

CREATE TABLE T1 (PK varchar PRIMARY key,val varchar) has no hint to optimize Phoenix. Salt-salting improves read/write performance salt can significantly increase the read pre-splitting/writing performance of the data into multiple regions. Even though the salting in most cases will be able to achieve better performance.

Cases:

CREATE TABLE TEST (HOST VARCHAR not NULL PRIMARY key,description VARCHAR) salt_buckets = 16

Note: Ideally, for a 16-zone server cluster with a four-core CPU, choose a salt bucket between 32-64 for optimal performance. each split table salting automatically splits the table, but if you want to control exactly where the table splits occur, you can pre-split the table in addition to adding extra bytes or changing the row key order.

Cases:

CREATE TABLE TEST (HOST VARCHAR not NULL PRIMARY key,description VARCHAR) SPLIT on (' CS ', ' EU ', ' NA ') using multiple column series

The column series contains the relevant data in a separate file. If you query using the selected columns, it is meaningful to combine the columns in the column family to improve read performance.

Cases:

The following CREATE TABLE DDL creates two columns F and A and B.

CREATE TABLE test (MYKEY varchar not NULL PRIMARY key,a.col1 varchar,a.col2 varchar,b.col3 VARCHAR) Use compressed disk compression to improve performance of large tables

Cases:

CREATE TABLE TEST (HOST VARCHAR not NULL PRIMARY key,description VARCHAR) COMPRESSION = ' GZ '

To create an index see faq.html#/how_do_i_create_secondary_index_on_a_table

To optimize cluster parameters, see http://hbase.apache.org/book/performance.html

optimize Phoenix parameters See tune.html How to create secondary indexes on a table.

Phoenix starts with the Phenix 2.1, which supports indexing of mutable and immutable data. Please note that the Phoenix 2.0.x only supports indexes on immutable data. The index write performance metric of the immutable table is slightly higher than the mutable table, but the data in the table cannot be updated.

Example creating a table

Invariant table: CREATE TABLE Test (MyKey varchar primary KEY, col1 varchar,col2 varchar) immutable_rows = true;

Variable tables: Creating table Tests (MyKey varchar primary KEY, col1 varchar,col2 varchar); Create an index on col2

CREATE index on test (col2) IDX creates an index on col1 and creates a contained index on col2

CREATE index on test (col1) IDX include (col2)

The row is raised in this test table, and the Phoenix query optimizer chooses the correct index to use. If Phoenix is using an index table, you can see it in the explain plan. You can also give a hint in the Phoenix query to use a specific index. Why my secondary index is not being used.

The secondary index is not used unless all the columns used in the query are in it (either as an indexed or overridden column). All columns that make up the primary key of the data table are automatically included in the index.

Example: DDL CREATE TABLE usertable (ID varchar primary key,firstname varchar,lastname varchar); Create an index idx_name on usertable (FirstName);

Query: DDL Select Id,firstname,lastname from usertable where FirstName = ' foo ';

In this case, the index is not used because the last name is not part of the index or overwrite column. This can be verified by reviewing the interpretation plan. To fix this create INDEX, it has an index or overwrite the LastName portion of the column. Example: Create a idx_name include (LastName) on usertable (FirstName); Phoenix how fast. Why so fast.

Phoenix, come on. A full table scan of a 100M row is usually done in 20 seconds (a narrow table on a medium-sized cluster). If the query contains a filter on a critical column, the time drops to a few milliseconds. For filters on non-key columns or non-leading key columns, you can add indexes on these columns, and you can get the same performance as filtering on the key columns by copying a portion of the table that has the keys of the indexed column.

Phoenix even during full sweep: Phoenix uses area boundary blocks to query your queries and run them in parallel on the client using a configurable number of threads the aggregation will take place on the server-side coprocessor, returning the amount of data returned to the client instead of returning it all. How to connect to a secure hbase cluster.

View Anil Gupta's top posts http://bigdatanoob.blogspot.com/2013/09/connect-phoenix-to-secure-hbase-cluster.html How to connect hbase running on the Hadoop-2.

The Hadoop-2 configuration file exists in the Phoenix Pom.xml. Phoenix can work on a table of any timestamp, as flexible as the HBase API.

By default, Phoenix Satellite TV lets HBase manage timestamps, displaying only the most recent values for all content. However, Phoenix also allows users to provide any timestamp. To do this, you can specify "CURRENTSCN" when you connect, as follows:

Properties Props = new properties ();
Props.setproperty ("Currentscn", long.tostring (TS));
Connection conn = Drivermanager.connect (Myurl, props);

Conn.createstatement (). Execute ("UPSERT into myTable VALUES (' a ')");
Conn.commit ();

The above is equivalent to using the HBase API:

Mytable.put (Bytes.tobytes (' a '), TS);

By specifying CURRENTSCN, you can tell Phoenix that you want to complete all the contents of the connection within that timestamp. Note that this applies to queries that are completed on the connection, for example, the query on mytable above will not see the data just inserted because it will only see the data that was created before its Currentscn property. This provides a way to snapshot, flashback, or Point-in-time queries.

Keep in mind that creating a new connection is not an expensive operation. The same underlying hconnection is used for all connections to the same cluster, so it is more or less instantiated with some objects. why my query does not do a range scan.

Ddl:create TABLE TEST (pk1 char (1) Not null,pk2 char (1) Not null,pk3 char (1) not null,non-pk varchar CONSTRAINT PK PRIMARY KEY (PK1,PK2,PK3));

A RANGE scan means that only a subset of rows in a table is scanned. This can happen if you use one or more of the boot columns in the PRIMARY KEY constraint. Queries that are not filtered on the leading PK column. SELECT * from the test, where pk2 = ' x ' and pk3 = ' Y '; Will cause a full scan, while the following query will result in a range scan in the test select *, where PK1 = ' x ' and pk2 = ' y ';. Note that you can add secondary indexes on the "PK2" and "PK3" columns, which results in a range scan of the first query (via the index table).

Degenerate scan means that the query cannot return any rows. If we can be sure at compile time, we don't have to worry about even running a scan.

Full scan means that all rows of the table will be scanned (filters may be applied if you have a WHERE clause)

Skip scan means that a subset or all rows in a table are scanned, but a large number of rows are skipped based on the conditions of the filter. For more information, see this blog. If you do not have a filter on the primary primary key column, we will not execute the Skip scan, but you can force the scan to be skipped using/+ Skip_scan/hint. Under certain conditions, that is, when your primary key column has a lower cardinality, it will be more efficient than a full scan. should I assemble the Phoenix JDBC connection?

No, it is not necessary to assemble the Phoenix JDBC connection.

Because of the underlying hbase connection, Phoenix's Connection object differs from most other JDBC connections. The Phoenix Connection object is designed as a low-cost, thin object. If the Phoenix connections is reused, then the basic hbase connection may not always be kept in a healthy state by the previous user. It is best to create a new Phoenix Connections to ensure that any potential problems are avoided.

By creating a delegate connection, you can retrieve a new Phoenix connection from the pool by creating a proxy connection, and then close the connection when it is returned to the pool (see PHOENIX-2388). Why Phoenix adds an empty/virtual keyvalue when making a upsert.

An empty or virtual keyvalue (with a _0 column qualifier) is required to ensure that the given column is available for all rows.

As you may know, the data is stored as keyvalues in HBase, which means that the full row key is stored for each column value. This also means that the row key is not stored at all, unless at least one of the columns is stored.

Now consider the JDBC row with the integer primary key, and several columns that are empty. To be able to store the primary key, you need to store a keyvalue to show whether the row exists. This column is represented by an empty column that you notice. This allows the "select * from TABLE" to be executed and receive records for all rows, even if the non-PK column is empty.

Even if only one column is empty for some (or all) of the records, the same problem occurs. Scanning on Phoenix will include empty columns to ensure that rows containing only primary keys (and for all non-key columns are empty) are included in the scan results.


Original: https://phoenix.apache.org/faq.html



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.