Introduction to the Python series introductory article--hdfs
The HDFS (Hadoop Distributed File System) Hadoop distributed filesystem is highly fault-tolerant and suitable for deployment on inexpensive machines. Python
Two interfaces are available, HDFSCLI (Restful Api Call) and Pyhdfs (RPC call), a section that focuses on the use of HDFSCLI
code example
Installation
pip install hdfs
Introduction of related modules
from hdfs import *
Create Client
"""It has two different kind of client, Client and InsecureClient.Client: cannot define file ownerInsecureClient: can define file owner, default None"""hdfs_root_path = ‘http://localhost:50070‘fs = Client(hdfs_root_path)fs = InsecureClient(hdfs_root_path, user=‘hdfs‘)
Create a Directory
"""Change file permission to 777, default None"""fs.makedirs(‘/test‘, permission=777)
Write a file
"""Write append or not depends on the file is exist or notstrict: If `False`, return `None` rather than raise an exception if the path doesn‘t exist."""content = fs.content(hdfs_file_path, strict=False)if content is None: fs.write(‘/test/test.txt‘, data=data, permission=777)else: fs.write(‘/test/test.txt‘, data=data, append=True)
Uploading files
"""overwrite default False, if don‘t set True, when you upload the file which is existin hdfs, it will raise File is exist Exception."""client.upload(hdfs_path, local_path, overwrite=True)
Summarize
Have not found a way to determine the existence of the file, the current code example with Fs.content () to replace, if everyone has a better way, also trouble sharing to me
Getting started with the Python series--hdfs