"Hadoop Learning" HDFS short-circuit local read

Source: Internet
Author: User
Tags unix domain socket

Hadoop version: 2.6.0

This article is from the Official document translation, reproduced please respect the work of the translator, note the following links:

Http://www.cnblogs.com/zhangningbo/p/4146296.html

Background

In HDFs, the data is usually read by Datanode. However, when a client reads a file to a Datanode request, Datanode reads the file from disk and sends the data to the client via a TCP socket. The so-called "short circuit" refers to the bypass Datanode to read the file, that is, allows the client to read the file directly. Obviously, this happens only when the client and the data are in the same place (the translator note: the same host). Short-circuit reading can lead to significant performance gains for many applications.

Create

To configure a short-circuit local read, you need to enable libhadoop.so. See native Libraries.

Short-circuit read using UNIX domain sockets. This is a special path in the file system that allows clients and datanode to communicate. You need to set a path to the socket, and datanode to be able to create the path. On the other hand, other users cannot create the path except the HDFs user and root user. Because of this, the path under/var/run or/var/lib is usually used.

The client and Datanode Exchange information through a section of shared memory on the/DEV/SHM.

Short-circuit local reads need to be configured simultaneously on both the Datanode and the client.

All the relevant configuration parameters

The main parameters related to this feature are the following 5:

Property name Default value Describe
Dfs.client.read.shortcircuit False This parameter opens the short-circuit local reads feature.
Dfs.domain.socket.path Optional. This parameter is a path to a UNIX domain socket for datanode and local HDFS client communication. If the string "_port" appears in the path, it is replaced with the TCP port of Datanode.
Dfs.client.read.shortcircuit.skip.checksum False If this parameter is set, the short-circuit local reads feature skips the checksums checksum. This is generally not recommended, but this parameter may be useful for special occasions. If you do a checksum check on your own outside of HDFS, you should consider setting the parameter.
Dfs.client.read.shortcircuit.streams.cache.size 256 Dfsclient maintains a cache for saving recently opened file descriptors. This parameter controls the capacity of this cache. More file descriptors can be used to increase the capacity of the cache, but may result in better performance on loads involving a large number of seek operations.
dfs.client.read.shortcircuit.streams.cache.expiry.ms 300000 This parameter controls the minimum time that a file descriptor needs to reside in the client cache context before it is closed because of a long period of inactivity .

The following is an example configuration.

<configuration>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>


<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn._PORT</value> #官网示例为/var/lib/hadoop-hdfs/dn_socket
</property>


<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>


<property>
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
<value>1000</value>
</property>


<property>
<name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
<value>10000</value>
</property>
</configuration>

HDFs short-circuit local read characteristics in older versions

The short-circuit local read feature implemented in the old version (the client can open the HDFs block file directly on which short-circuit local read socket) is still available for platforms other than Linux. Enable the attribute by setting properties Dfs.client.use.legacy.blockreader.local and Dfs.client.read.shortcircuit to True.

You also need to set the property dfs.datanode.data.dir.perm to 750 to override the default of 700, and use the Chmod/chown command to change the directory tree permissions under Dfs.datanode.data.dir for the client and Datanode readable. You have to be careful, because doing this means that the client can bypass HDFs permissions to read all the block files.

<configuration>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>


<property>
<name>dfs.client.use.legacy.blockreader.local</name>
<value>true</value>
</property>


<property>
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
</property>


<property>
<name>dfs.block.local-path-access.user</name>
<value>foo,bar</value>
</property>
</configuration>

"Hadoop Learning" HDFS short-circuit local read

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.