Getting started with the Python series--hdfs

Source: Internet
Author: User

Introduction to the Python series introductory article--hdfs

The HDFS (Hadoop Distributed File System) Hadoop distributed filesystem is highly fault-tolerant and suitable for deployment on inexpensive machines. Python
Two interfaces are available, HDFSCLI (Restful Api Call) and Pyhdfs (RPC call), a section that focuses on the use of HDFSCLI

code example
  1. Installation

    pip install hdfs
  2. Introduction of related modules

    from hdfs import *
  3. Create Client

    """It has two different kind of client, Client and InsecureClient.Client: cannot define file ownerInsecureClient: can define file owner, default None"""hdfs_root_path = ‘http://localhost:50070‘fs = Client(hdfs_root_path)fs = InsecureClient(hdfs_root_path, user=‘hdfs‘)
  4. Create a Directory

    """Change file permission to 777, default None"""fs.makedirs(‘/test‘, permission=777)
  5. Write a file

    """Write append or not depends on the file is exist or notstrict: If `False`, return `None` rather than raise an exception if      the path doesn‘t exist."""content = fs.content(hdfs_file_path, strict=False)if content is None:    fs.write(‘/test/test.txt‘, data=data, permission=777)else:    fs.write(‘/test/test.txt‘, data=data, append=True)
  6. Uploading files

    """overwrite default False, if don‘t set True, when you upload the file which is existin hdfs, it will raise File is exist Exception."""client.upload(hdfs_path, local_path, overwrite=True)
  7. Summarize
    Have not found a way to determine the existence of the file, the current code example with Fs.content () to replace, if everyone has a better way, also trouble sharing to me

Getting started with the Python series--hdfs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.