使用非Java語言訪問hdfs有兩種方法,一種是利用libhdfs.so來訪問hdfs,另一種是使用thrift通訊架構來訪問,這裡暫先介紹libhdfs
1、先安裝libhdfs
# 前提是安裝jdk6、jre6,利用cloudera.repo來安裝hadoop-0.20
sudo yum –y install libhdfs*
2、安裝python-devel(2.6+), gcc
sudo yum –y install python-devel gcc
3、下載libpyhdfs源碼, 準備依賴包
svn checkout http://libpyhdfs.googlecode.com/svn/trunk/ libpyhdfs
cd libpyhdfs
cp /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u0.jar lib/hadoop-0.20.1-core.jar
cp /usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar lib/
cp /usr/lib/libhdfs.so.0 lib/
ln –s lib/libhdfs.so.0 lib/libhdfs.so
4、配置setup.py, 修改Java環境路徑
vim setup.py
include_dirs = ['/usr/lib/jvm/java-6-sun/include/']
-> include_dirs = ['/usr/java/jdk1.6.0_24/include/']
runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/usr/lib/jvm/java-6-sun/jre/lib/i386/server']
-> runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/usr/java/jdk1.6.0_24/jre/lib/i386/server'],
5、修改jdk1.6.0_24/include/jni.h
#include "jni_md.h"
-> #include "linux/jni_md.h"
6、安裝libpyhdfs
sudo python setup.py install --prefix="/usr/local"
# 測試
python pyhdfs_test.py
# 十分悲劇的報錯了。。
import pyhdfs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libjvm.so: cannot enable executable stack as shared object requires: Permission denied
# 查了好久,最後發現是selinux的問題,暫時沒有別的辦法,把丫關了吧
sudo vim /etc/selinux/config
# 修改 SELINUX=disabled, 然後非重啟關閉
sudo setenforce 0
# 檢查selinux狀態
sudo getenforce
再試試!!