Python Access Secured Hadoop Cluster Through Thrift API Apache Thrift Python Kerberos Support Typical way to connect kerberos secured thrift server Example - Hive Example - HBase
Apache Thrift Python Kerberos Support
Both supports are only avaliable in Linux platform Native support
Dependency: kerberos(python package) >> pure-sasl(python package) >> thrift (python package)
Source: https://github.com/apache/thrift/blob/0.9.3/lib/py/src/transport/TTransport.py
class TSaslClientTransport(TTransportBase, CReadableTransport): """ SASL transport """ START = 1 OK = 2 BAD = 3 ERROR = 4 COMPLETE = 5 def __init__(self, transport, host, service, mechanism='GSSAPI', **sasl_kwargs): """
Cloudera’s new API
Dependency: cyrus-sasl-*, saslwrapper, python-saslwrapper(Linux libs) >> sasl(python package) >> thrift-sasl(python package) >> thrift(python package)
Source: init.py”>https://github.com/cloudera/thrift_sasl/blob/master/thrift_sasl/init.py
class TSaslClientTransport(TTransportBase, CReadableTransport): START = 1 OK = 2 BAD = 3 ERROR = 4 COMPLETE = 5 def __init__(self, sasl_client_factory, mechanism, trans): """ @param sasl_client_factory: a callable that returns a new sasl.Client object @param mechanism: the SASL mechanism (e.g. "GSSAPI") @param trans: the underlying transport over which to communicate.
Typical way to connect thrift server Non-secured Thrift Server
transport = TSocket.TSocket(thrift_server_url, thrift_server_port) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport)
Secured Thrift Server
Use transport
TSaslClientTransport
transport = TSocket.TSocket(thrift_server_url, thrift_server_port) transport = TTransport.TSaslClientTransport( transport, host='krb5_server' service='service_name' mechanism='GSSAPI' ) protocol = TBinaryProtocol.TBinaryProtocol(transport)
Example - Server information
/etc/krb5.conf
[libdefaults]default_realm = CLOUDERAdns_lookup_kdc = falsedns_lookup_realm = falseticket_lifetime = 86400renew_lifetime = 604800forwardable = truedefault_tgs_enctypes = rc4-hmacdefault_tkt_enctypes = rc4-hmacpermitted_enctypes = rc4-hmacudp_preference_limit = 1kdc_timeout = 3000[realms]CLOUDERA = {kdc = quickstart.clouderaadmin_server = quickstart.cloudera}
Hive Thrift Server: quickstart.cloudera:10000
HBase Thrift Server: quickstart.cloudera:9090 Example - Hive
import sysfrom hive import ThriftHivefrom hive.ttypes import HiveServerExceptionfrom thrift import Thriftfrom thrift.transport import TSocketfrom thrift.transport import TTransportfrom thrift.protocol import TBinaryProtocoltransport = TSocket.TSocket('quickstart.cloudera', 10000)transport = TTransport.TSaslClientTransport( transport, host='quickstart.cloudera' service='hive' mechanism='GSSAPI' )transport.open()protocol = TBinaryProtocol.TBinaryProtocol(transport)client = ThriftHive.Client(protocol)
Example - HBase
from thrift.transport.TSocket import TSocketfrom thrift.transport import TTransportfrom thrift.protocol import TBinaryProtocolfrom hbase import Hbasetransport = TSocket.TSocket('quickstart.cloudera', 9090)transport = TTransport.TSaslClientTransport( transport, host='quickstart.cloudera' service='hbase' mechanism='GSSAPI' )transport.open()protocol = TBinaryProtocol.TBinaryProtocol(transport)client = Hbase.Client(protocol)