HDFS short circuit local reads

Source: Internet
Author: User
Tags unix domain socket hdfs dfs hadoop fs
HDFS short circuit local reads

One basic principle of hadoop is that the overhead of mobile computing is smaller than that of mobile data. Therefore, hadoop usually tries its best to move computing to nodes with data. This makes the dfsclient client for reading data in hadoop and the datanode for providing data often exist on one node, resulting in many "Local reads ".

At the initial design, the local reads and remote reads (dfsclient and datanode are not in the same node) are handled in the same way, that is, data is read by datanode first, then, the data is transmitted to dfsclient through rpc. This process is relatively simple, but the performance will be affected, because datanode needs to be transitioned in the middle. This article will introduce some optimizations to this problem.

Since dfsclient and data are on a single machine, it is natural to let dfsclient bypass datanode to read data by itself. There are two solutions in specific implementation.

HDFS-2246

In this Jira, engineers thought that since the dfsclient reading data is on the same machine as the data, datanode will put the data in the path of the file system, tell the dfsclient where to start reading (offset) and how long it needs to be read, and then the dfsclient will open the file and read it by itself. The idea is good. The problem lies in complicated configuration and security issues.

The first is the configuration problem. Because dfsclient enables the file to read data, you need to configure a whitelist to define which users have the permission to access the datanode data directory. If a new user is added, you must modify the whitelist. It should be noted that the client is allowed to access the datanode Data Directory, which means that any user has this permission can access other data in the directory, resulting in a security vulnerability. Therefore, this implementation is no longer recommended.

HDFS-347

In Linux, there is a technology called Unix domain socket. UNIX domain socket is a communication method between processes. It enables two processes on the same machine to communicate in Socket mode. Another major benefit of this mechanism is that, in addition to common data, the two processes can also transmit file descriptors between processes.

Assume that two users A and B on the machine have the permission to access a file while B does not, and B needs to access the file. With the help of Unix domain socket, A can open the file to get a file descriptor, and then pass the file descriptor to B, B can read the content of the file even if it has no corresponding permissions. In HDFS scenarios, A is datanode, B is dfsclient, and the file to be read is a file in the datanode data directory.

This solution is more secure than the previous solution, at least it only allows dfsclient to read the files it needs.

If you want to learn more about Unix domain socket, you can look at: http://www.thomasstover.com/uds.html and http://troydhanson.github.io/misc/Unix_domain_sockets.html

How to configure

Because Java cannot directly operate Unix domain socket, you need to install the hadoop native package libhadoop. So. If your cluster is installed with major hadoop distributions (such as pivotal HD and CDH), these native packages are usually installed during hadoop installation. You can run the following command to check whether these native packages are installed.

[[email protected] ~]$ hadoop checknativehadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0zlib:   true /lib64/libz.so.1snappy: true /usr/lib64/libsnappy.so.1lz4:    true revision:99bzip2:  true /lib64/libbz2.so.1

Configuration items related to short circuit local reads (in the hdfs-site.xml) are as follows:

  <property>    <name>dfs.client.read.shortcircuit</name>    <value>true</value>  </property>  <property>    <name>dfs.domain.socket.path</name>    <value>/var/lib/hadoop-hdfs/dn_socket</value>  </property>

Specifically, DFS. Client. Read. shortcircuit enables this function. dfs. domain. Socket. path is the local path of the socket for communication between datanode and dfsclient.

How can I confirm that the configuration has taken effect?

According to the above configuration, how can I check whether short circuit local reads works when reading data from HDFS? There are two ways:

  1. View datanode logs

In the startup log of datanode, you can also see the following logs indicating that the Unix domain socket is enabled.

2014-10-17 08:18:59,789 INFO  datanode.DataNode (DataNode.java:<init>(277)) - File descriptor passing is enabled....2014-10-17 08:18:59,867 INFO  datanode.DataNode (DataNode.java:initDataXceiver(579)) - Listening on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket

Let's read another file. In my test Cluster File/tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm and its related information is as follows:

[[email protected] ~]$ hdfs dfs -ls /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm-rw-r--r--   3 hdfs hdfs  109028097 2014-10-17 08:31 /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm[[email protected] ~]$ hdfs fsck /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm -files -blocksConnecting to namenode via http://c6404.ambari.apache.org:50070FSCK started by hdfs (auth:SIMPLE) from /192.168.64.102 for path /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm at Fri Oct 17 08:40:47 UTC 2014/tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm 109028097 bytes, 1 block(s):  OK0. BP-1796216370-192.168.64.104-1413533983834:blk_1073741962_1138 len=109028097 repl=3

This file has a block with the ID blk_rj3741962

Now I copy this file to the local device.

hadoop fs -get /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm /tmp

Then open the datanode log on the node. The following log indicates that short circuit local reads is used to read block1073741962.

2014-10-17 08:32:53,983 INFO  DataNode.clienttrace (DataXceiver.java:requestShortCircuitFds(334)) - src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1073741962, srvID: 4ff4d539-1bca-480d-91e3-e5dc8c6bc4a8, success: true

2. readstatistics API
Another method is to use the getreadstatistics API of hdfsdatainputstream to obtain the statistics for reading data. The related instance code is as follows:

public class FileSystemCat {  public static void main(String[] args) throws IOException {    String uri = args[0];    Configuration conf = new Configuration();    FileSystem fs = FileSystem.get(URI.create(uri), conf);    OutputStream out = new FileOutputStream("/tmp/out");    FSDataInputStream in = null;    try {      in = fs.open(new Path(uri));      IOUtils.copy(in, out);      if (in instanceof HdfsDataInputStream) {        HdfsDataInputStream hdfsIn = (HdfsDataInputStream) in;        DFSInputStream.ReadStatistics readStatistics = hdfsIn.getReadStatistics();        System.out.println("Total Bytes Read Bytes: " + readStatistics.getTotalBytesRead());        System.out.println("Short Circuit Read Bytes: " + readStatistics.getTotalShortCircuitBytesRead());        System.out.println("Local Read Bytes:" + readStatistics.getTotalLocalBytesRead());      }    } finally {      IOUtils.closeQuietly(in);      IOUtils.closeQuietly(out);    }  }}

Let's try again:

[[email protected] classes]$ hdfs dfs -ls /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm-rw-r--r--   3 hdfs hdfs  109028097 2014-10-17 08:31 /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpm[[email protected] classes]$ hadoop FileSystemCat /tmp/hive-0.13.1.phd.3.0.0.0-1.el6.src.rpmTotal Bytes Read Bytes: 109028097Short Circuit Read Bytes: 109028097Local Read Bytes:109028097

We can see that all data is read through short circuit local read.

Summary

This article introduces two implementations of short circuit local reads in HDFS, and details the configuration and related knowledge based on UNIX domain socket.


HDFS short circuit local reads

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.