View Original
Previously, ply of python was used to write a prototype of hbase-like SQL Compiler. Currently, only the lexical syntax analyzer has been initially completed, when writing the following pre-processor, logical plan generator, and physical plan generator, the problem arises: hbase and the entire hadoop project are written in Java, of course, Java APIs are the most direct. To use APIs in other languages, you can use the following methods:
- Thrift API
- Restful API
- Other JVM-based languages (Jython, groovy, Scala, etc)
Currently, there are three solutions to complete hbase-like SQL Compiler operations:
- The front end continues to write the compiler using python, and the back end uses Java APIs to operate hbase. The intermediate results are saved in some form (such as files ).
- Re-use Java to write the lexical syntax compiler, and then directly use hbase's Java API
- Continue to write the compiler in Python, and then use Jython
For solution 1, the intermediate results may be complex data structures that are not easy to save. Even if they can be saved, reading and writing will be troublesome.
Solution 2 focuses on the need to find and learn the lexical syntax compiler of Java. I have found anlr, javacc, and so on before, but most of them are heavy and the learning cost is high. Besides, my expectation for SQL-like compilers is relatively simple, and there is no need for such advanced tool libraries.
For solution 3, the compiler written with ply can be said to be half done, and it is also very lightweight to use. If Jython can operate hbase well, the progress can be ensured. I tried Jython and it feels good!
Considering various factors, solution 3 is selected for the time being.
Well, it's a bit difficult. Let's get down to the point and see if Jython operates hbase. (Configuration of Jython and hbase will not be discussed here)
First, start hbase:
Bin/start-hbase.sh
Or, the classpath used by hbase (because Jython needs to use the Java class under classpath ),
ps auwx | grep java | grep org.apache.hadoop.hbase.master.HMaster | perl -pi -e "s/.*classpath //"
PS is used to obtain a running process. The format started by Java is similar
/Usr/lib/JVM/Java-6-sun // bin/Java-xmx1000m... -Classpath xxx
The last Perl is used to get the xxx after-classpath. -P refers to the cyclic operation for entering one row and one row. I refers to the process where you do not need to back up the input file.-E refers to the execution of the command S/. * classpath //. (This command removes both the classpath and the previous characters. Is it sure-Is classpath the last parameter? What if there are other parameters not related to classpath? There are indeed unrelated parameters in the actual situation. Fortunately, there is only one point. This is a small bug .)
Import the obtained classpath to the Environment Variable
export CLASSPATH=XXX
In this way, you can use Jython to run a Python script to operate hbase. The following is a simple example:
123456789101112131415161718192021222324252627282930313233343536 |
import java.lang
from org.apache.hadoop.hbase
import HBaseConfiguration, HTableDescriptor, HColumnDescriptor, HConstants
from org.apache.hadoop.hbase.client
import HBaseAdmin, HTable, Put, Get conf =
HBaseConfiguration() admin =
HBaseAdmin(conf) tablename =
"test_jython_hbase" desc =
HTableDescriptor(tablename) desc.addFamily(HColumnDescriptor( "content" )) # Drop and recreate if it exists if admin.tableExists(tablename):
admin.disableTable(tablename) admin.deleteTable(tablename) admin.createTable(desc) table =
HTable(conf, tablename) # Add content row =
'row_x' put_row =
Put(row) put_row.add( 'content' ,
'some_content' ,
'some_value' ) table.put(put_row) # Read content get =
Get(row) data_row =
table.get(get) data =
java.lang.String(data_row.value(), "UTF8" ) print "The fetched row contains the value '%s'" %
data # Delete the table. admin.disableTable(desc.getName()) admin.deleteTable(desc.getName()) |
The output result is as follows:
............
12/01/29 23:55:51 debug client. hconnectionmanager $ hconnectionimplementation: cached location for. Meta., 1.1028785192 is ubuntu2-vmware: 60020
12/01/29 23:55:52 debug client. metascanner: scanning. Meta. Starting at ROW = test_jython_hbase, 00000000000000 for max = 2147483647 rows
12/01/29 23:55:52 info zookeeper. zookeeper: initiating client connection, connectstring = ubuntu3-vmware: 2181, ubuntu2-vmware: 2181 sessiontimeout = 180000 watcher = hconnection
12/01/29 23:55:52 info zookeeper. clientcnxn: Opening socket connection to server ubuntu3-vmware/192.168.1.202: 2181
12/01/29 23:55:52 info zookeeper. clientcnxn: Socket Connection established to ubuntu3-vmware/192.168.1.202: 2181, initiating session
12/01/29 23:55:52 info zookeeper. clientcnxn: session establishment complete on server ubuntu3-vmware/192.168.1.202: 2181, sessionid = 0x1352c9556270012, negotiated timeout = 180000
12/01/29 23:55:52 debug client. hconnectionmanager $ hconnectionimplementation: lookedup root region location, connection = org. Apache. hadoop. hbase. Client. hconnectionmanager $ hconnectionimplementation @ 8a2006; HSA = ubuntu3-vmware: 60020
12/01/29 23:55:52 debug client. hconnectionmanager $ hconnectionimplementation: cached location for. Meta., 1.1028785192 is ubuntu2-vmware: 60020
12/01/29 23:55:52 debug client. metascanner: scanning. Meta. Starting at ROW = test_jython_hbase, 00000000000000 for max = 10 rows
12/01/29 23:55:52 debug client. hconnectionmanager $ hconnectionimplementation: cached location for test_jython_hbase, 1327910151208.09451a5e064db613641741bd8c896eb7. is ubuntu2-vmware: 60020
The fetched row contains the value 'some _ value'
12/01/29 23:55:52 info client. hbaseadmin: started disable of test_jython_hbase
12/01/29 23:55:52 debug client. hbaseadmin: Sleeping = 1000 ms, waiting for all regions to be disabled in test_jython_hbase
12/01/29 23:55:53 debug client. hbaseadmin: Sleeping = 1000 ms, waiting for all regions to be disabled in test_jython_hbase
12/01/29 23:55:54 info client. hbaseadmin: Disabled test_jython_hbase
12/01/29 23:55:55 info client. hbaseadmin: deleted test_jython_hbase