Elasticsearch-hadoop is a project that integrates Hadoop and elasticsearch in depth, and is also a subproject maintained by ES officials, by implementing input and output between Hadoop and Es, You can read and write data from the ES cluster in Hadoop, giving full play to the benefits of map-reduce parallel processing, and bringing real-time search possibilities to Hadoop data.
Project website: http://www.elasticsearch.org/overview/hadoop/
Operating Environment:
CDH4, ElasticSearch0.90.2
Http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_3.html
Https://github.com/medcl/elasticsearch-rtf
Interop for Hive and es:
#安装, add the Elasticsearch-hadoop jar path inside Hive
#下载hadoop-es jar Package, Https://download.elasticsearch.org/hadoop/hadoop-latest.zip
#Hive加载的JAR路径为本地路径
[Medcl@Node-1 ~]$lsElasticsearch-hadoop-1.3.0.m1.jar[Medcl@Node-1 ~]$pwd
/Home/Medcl[Medcl@Node-1 ~]$ hive-hiveconf hive.aux.jars.path=/Home/Medcl/Elasticsearch-hadoop-1.3.0.m1.jar Logging initialized using configurationinch file:/etc/Hive/Conf.dist/Hive-log4j.properties Hive Historyfile=/Tmp/Medcl/Hive_job_log_94db3616-e210-4aab-b07b-6fb159e217ec_1758848920.txt
#ElasticSearch集群名为 "Elasticsearch", and Hadoop on a machine
#Hive里面创建一个Table (user), and use Hadoop-elasticsearch to correlate an index (/index/user), 2 fields, ID, and name
CREATE EXTERNAL TABLE user (ID INT, name String,site STRING)
STORED by ' Org.elasticsearch.hadoop.hive.ESStorageHandler '
tblproperties(' es.resource ' = ' index/user/',
' es.index.auto.create ' = ' true ')
Operation under MEDCL: CREATE EXTERNAL TABLE User(IDINT, name STRING)STORED by ' Org.elasticsearch.hadoop.hive.ESStorageHandler ' tblproperties(' Es.resource ' = '/index/user/', ' es.index.auto.create ' = ' true '); Failed:execution Error,returnCode 1 from Org.apache.hadoop.hive.ql.exec.DDLTask Hive>CREATE EXTERNAL TABLE User(IDINT, name STRING)
>STORED by ' Org.elasticsearch.hadoop.hive.ESStorageHandler '>Tblproperties(' Es.resource ' = ' medcl/',>' Es.index.auto.create ' = ' false '); Failed:errorinchMetadata:metaexception(Message:got exception:org.apache.hadoop.security.AccessControlException Permission DENIED:USER=MEDCL, Access=write , inode= "/user": Hdfs:supergroup:drwxr-xr-x #擦, see permissions[Medcl@Node-1 ~]$ Hadoop FS-LSR/Lsr:DEPRECATED:Please use ' ls-r ' instead. Drwxrwxrwt-hdfs SuperGroup 0 2013-12-16 22:19/TMP drwxr-xr-x-hdfs supergroup 0 2013-12-16 22:25/User Drwxr-xr-x-MEDCL supergroup 0 2013-12-17 00:30/User/MEDCL drwxr-xr-x-MEDCL supergroup 0 2013-12-16 22:32/User/Medcl/input-rw-r--r--1 MEDCL supergroup 2801897 2013-12-16 22:32/User/Medcl/Input/File1.txt drwxr-xr-x-MEDCL supergroup 0 2013-12-17 00:30/User/Medcl/lib-rw-r--r--1 MEDCL supergroup 160414 2013-12-17 00:30/User/Medcl/Lib/Elasticsearch-hadoop-1.3.0.m1.jar drwxr-xr-x-hdfs supergroup 0 2013-12-16 22:20/var drwxr-xr-x-hdfs supergroup 0 2013-12-16 22:20/Var/Lib #原来user目录权限是hdfs, OK, switch Hdfs,jar also change the location of the HDFs user can access, ON/tmp bar[Root@Node-1 MEDCL]# CP elasticsearch-hadoop-1.3.0.m1.jar/tmp/[Root@Node-1 MEDCL]# ^c[Root@Node-1 MEDCL]# sudo-u HDFs hive-hiveconf hive.aux.jars.path=/tmp/elasticsearch-hadoop-1.3.0.m1.jar Logging initialized using Configurationinch file:/etc/Hive/Conf.dist/Hive-log4j.properties Hive Historyfile=/Tmp/Hdfs/Hive_job_log_bdad4d7a-f929-43d7-a56e-e026fdd7e3b4_1219802521.txt Hive>CREATE EXTERNAL TABLE User(IDINT, name STRING)
>STORED by ' Org.elasticsearch.hadoop.hive.ESStorageHandler '>Tblproperties(' Es.resource ' = '/index/user/',>' Es.index.auto.create ' = ' false '); 2013-12-16 17:09:29.560 GMT Thread[Main,5,main]Java.io.FileNotFoundException:derby.log(Permission denied)----------------------------------------------------------------2013-12-16 17:09:29.877 gmt:booting Derby version The Apache software Foundation-apache Derby-10.4.2.0-(689064): instance a816c00e-0142-fc62-4b5c-000000cec758 on database directory/Var/Lib/Hive/Metastore/metastore_dbinchREAD only mode Database Class Loader started-derby.database.classpath= "Failed:errorinchMetadata:java.lang.RuntimeException:Unable to instantiate Org.apache.hadoop.hive.metastore.HiveMetaStoreClient Failed:execution Error,returnCode 1 from Org.apache.hadoop.hive.ql.exec.DDLTask #ok, kill lock[Root@Node-1 ~]# ls/var/lib/hive/metastore/metastore_db Dbex.lck db.lck log seg0 service.properties tmp[Root@Node-1 ~]# Rm/var/lib/hive/metastore/metastore_db/dbex.lckRM: Remove Regularfile `/Var/Lib/Hive/Metastore/metastore_db/Dbex.lck '? y [root@node-1 ~]# rm/var/lib/hive/metastore/metastore_db/db.lck rm:remove regular file '/var/lib/hive/metastore/ Metastore_db/db.lck '? Y #另外忘记关另外一个hive实例了, no wonder.[Root@NODE-1 tmp]# Ps-aux|grep Hive Warning:bad syntax, perhaps a bogus '-'? See/Usr/Share/Doc/procps-3.2.8/FAQ root 10855 0.0 0.1 148024 2064 pts/0 s+ 01:09 0:00sudo-U hdfs hive-hiveconf hive.aux.jars.path=/Tmp/Elasticsearch-hadoop-1.3.0.m1.jar HDFs 10856 1.8 5.7 858344 109892 pts/0 sl+ 01:09 0:06/Usr/Lib/Jvm/Java-openjdk/Bin/Java-xmx256m-dhadoop.log.dir=/Usr/Lib/Hadoop/Logs-dhadoop.log.file=hadoop.log-dhadoop.home.dir=/Usr/Lib/Hadoop-dhadoop.id.str=-dhadoop.root.logger=info,console-djava.library.path=/Usr/Lib/Hadoop/Lib/native-dhadoop.policy.file=hadoop-policy.xml-djava.net.preferipv4stack=true-dhadoop.security.logger=info,nullappender Org.apache.hadoop.util.RunJar/Usr/Lib/Hive/Lib/Hive-cli-0.10.0-cdh4.5.0.jar org.apache.hadoop.hive.cli.clidriver-hiveconf hive.aux.jars.path=/Tmp/Elasticsearch-hadoop-1.3.0.m1.jar #权限问题[Root@NODE-1 tmp]# ll/var/lib/hive/metastore/metastore_db/total Drwxrwxr-x 2 medcl medcl 4096 Dec 00:56 log drwxrwxr-x 2 MEDCL Medc L 4096 Dec 00:56 seg0-rw-rw-r--1 MEDCL MEDCL 860 Dec-00:56 service.properties drwxrwxr-x 2 MEDCL medcl 4096 Dec 1 7 01:01 tmp[Root@NODE-1 tmp]# sudo-u HDFs hive-hiveconf hive.aux.jars.path=/tmp/elasticsearch-hadoop-1.3.0.m1.jar^c[Root@NODE-1 tmp]# chmod 777/var/lib/hive/metastore/metastore_db/-R[Root@NODE-1 tmp]# sudo-u HDFs hive-hiveconf hive.aux.jars.path=/tmp/elasticsearch-hadoop-1.3.0.m1.jar Logging initialized using Configurationinch file:/etc/Hive/Conf.dist/Hive-log4j.properties Hivehistory<