Centralized Cache Management in HDFS

Last Update:2014-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Centralized Cache Management inhdfsoverview

Centralized Cache Management in HDFS is an explicit Cache Management mechanism that allows you to specify the path cached by HDFS. Namenode will communicate with the datanode that has the required block on the disk, and command it to cache the block in the off-heap cache.

Centralized Cache Management in HDFS has many important advantages.

1. Prevent frequently used data from being evicted from the memory. This is especially important when the working set size exceeds the memory size, which is common for HDFS load.

2. Because the datanode cache is managed by namenode, the application can query the cached block location when determining the task location. Works together a task with a copy of a cache block.

3. When a block is cached by a datanode, the client can use a new, more effective, 0-copy read operation API. Because the checksum check of cached blocks is performed only once by datanode, the client can use the new API to perform read operations with zero overhead.

4. Centralized cache can comprehensively improve the cluster memory usage. When the buffer cache of the operating system is dependent on each datanode, repeated reading of a block will cause n copies of the block to be placed in the buffer cache. With centralized Cache Management, a user can clearly create M replicas to save the memory of N-M replicas.

User Cases

Centralized Cache Management is useful for repeatedly accessed files. For example, a small fact table in hive is often used in join operations. This small fact table is a well-cached candidate data. On the other hand, caching a query report for one year may be of little use because the historical data may be read only once.

Centralized Cache Management is also very useful for mixed loads with service-level protocols. Cache the working sets of high-priority loads to ensure that low-priority loads do not compete with disk I/O.

Architecture

In this architecture, the namenode load coordinates all the datanode off-stack caches in the cluster. Namenode periodically receives a cache report from each datanode, which describes all blocks cached on a specific datanode. Namenode manages the cache of datanode by carrying the cache and non-Cache commands in the Heartbeat message.

Namenode queries a series of its buffer commands to determine the path to be cached. Cached commands are stored persistently in fsimage and editlog files. They can be added, deleted, and modified through Java or command line APIs. Namenode stores a series of buffer pools, which are managed entities used to group cache commands for resource management and forcible licensing.

Namenode periodically scans the namespace and active cache commands to determine which block needs to be cached or removed from the cache, and allocates the datanode cache. A re-scan is triggered when the user executes a cache command like adding or removing a buffer pool.

Currently, a block is not cached when it is being built, damaged, or otherwise unfinished. If the cached command caches a symbolic link, the target of the symbolic link will not be cached.

The cache is currently only at the file or directory level. The cache of block and sub-block will be used in future work.

Conceptscache Directive

A cache command defines a path to be cached. The path is either a directory or a file. Directories are cached recursively, which means files in the directory list are cached at the file level.

The command also specifies additional parameters, such as cache copy factor and expiration time. The copy factor specifies the number of cached blocks. If multiple cache commands reference the same file, the minimum cache copy factor will be applied.

The expiration time is specified in the command line as TTL, a relative expiration time. When a cache command expires, when namenode determines the cache, it is not considered by namenode.

Cache pool

A buffer pool is a management entity used to manage cache instruction groups. The buffer pool has Unix-like permissions, which restrict users and user groups from accessing the buffer pool. Write Permission allows you to add and delete buffer commands in the buffer pool. Read Permission allows users to list buffer commands and additional metadata in the buffer pool. There is no concept of executable permissions.

The buffer pool is also used for resource management. You can set a maximum value for the buffer pool, which limits the total size of data that can be cached by commands in the buffer pool. Normally, the buffer pool limit is equal to or greater than the total remaining HDFS cache in the cluster. The buffer pool also tracks a lot of statistics to help cluster users determine what to cache.

You can also set the maximum expiration time for the buffer pool. This will limit the maximum expiration time of commands added to the buffer pool.

CacheadminCommand-line interface

In the command line, administrators and users can use the HDFS cacheadmin sub-command to interact with the buffer pool and commands.

The buffer command is identified by a globally unique, non-repeated 64-bit integer ID. Even after the cache is deleted, the ID will not be reused.

The buffer pool is identified by a globally unique string name.

Cache directive commandsadddireve ve

Usage: HDFS cacheadmin-adddirective-path <path>-pool <pool-Name> [-force] [-replication <replication>] [-TTL <time-to-live>]

Add a new cache command:

<Path>	A path to cache. The path can be a directory or a file.
<Pool-Name>	The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives.
-Force	Skips checking of cache pool resource limits.
<Replication>	The cache replication factor to use. defaults to 1.
<Time-to-live>	How long the directive is valid. can be specified in minutes, hours, and days, e.g. 30 m, 4 h, 2D. valid units are [smhd]. "Never" indicates a directive that never expires. if unspecified, the directive never expires.

Removedirective

Usage:HDFS cacheadmin-removedirective <ID>

Removes a cache command.

<ID>	The ID of the cache directive to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cachedirective IDs, use the-listdirectives command.

Removedirectives

Usage:HDFS cacheadmin-removedirectives <path>

Removes the cache commands specified in the list of all paths.

<Path>

The path of the cache directives to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cache directives, use the-listdirectives command.

Listdirectives

Usage:HDFS cacheadmin-listdirectives [-Stats] [-path <path>] [-pool <pool>]

Lists All cache commands.

<Path>	List only cache directives with this path. Note that if there is a cache directivePathIn a cache pool that we don't have read access for, it will not be listed.
<Pool>	List only path cache directives in that pool.
-Stats	List path-based Cache directive statistics.

Cache pool commandsaddpool

Usage:HDFS cacheadmin-addpool <Name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxttl <maxttl>

Add a new buffer pool.

<Name>	Name of the new pool.
<Owner>	Username of the owner of the pool. defaults to the current user.
<Group>	Group of the pool. defaults to the primary group name of the current user.
<Mode>	Unix-style permissions for the pool. permissions are specified in octal, e.g. 0755. By default, this is set to 0755.
<Limit>	The maximum number of bytes that can be cached by directives in this pool, in aggregate. By default, no limit is set.
<Maxttl>	The maximum allowed time-to-live for directives being added to the pool. this can be specified in seconds, minutes, hours, and days, e.g. 120 s, 30 m, 4 h, 2D. valid units are [smhd]. by default, no maximum is set. A value of "never" specifies that there is no limit.

Modifypool

Usage:HDFS cacheadmin-modifypool <Name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxttl <maxttl >]

Modify the metadata of an existing buffer pool.

<Name>	Name of the pool to modify.
<Owner>	Username of the owner of the pool.
<Group>	Groupname of the group of the pool.
<Mode>	Unix-style permissions of the pool in octal.
<Limit>	Maximum number of bytes that can be cached by this pool.
<Maxttl>	The maximum allowed time-to-live for directives being added to the pool.

Removepool

Usage:HDFS cacheadmin-removepool <Name>

Remove a buffer pool. This removes the related paths.

<Name>

Name of the cache pool to remove.

Listpools

Usage:HDFS cacheadmin-listpools [-Stats] [<Name>]

Displays information about one or more buffer pools, such as names, owners, user groups, and permissions.

-Stats	Display additional cache pool statistics.
<Name>	If specified, list only the named cache pool.

Help

Usage:HDFS cacheadmin-help <command-Name>

Detailed Help information about a command.

<Command-Name>

The command for which to get detailed help. If no command is specified, print Detailed Help for all commands.

Configurationnative Libraries

To lock block files in the memory, datanode depends on the local JNI code in libhadoop. So. If you use HDFS centralized Cache Management, make sure JNI is enabled.

Configuration propertiesrequired

Are you sure you have configured the following Configuration:

1. dfs. datanode. Max. Locked. Memory

This determines the maximum size of memory that a datanode will use for HDFS caching. You can use ulimit-L to view the Memory Lock limit of the Program on the datanode node. You need to add this value to adapt to this configuration parameter. After setting this value, remember that you will need the memory space to do other things, such as datanode and the JVM stack of the application and the page cache of the operating system.

Optional

The following attributes are unnecessary and can be specified during optimization:

1. dfs. namenode. Path. Based. cache. Refresh. interval. Ms

Namenode uses this value as the interval between two consecutive cache scans. This will calculate the number of cached blocks and each datanode that contains a copy of a block should cache this block.

2. dfs. datanode. fsdatasetcache. Max. threads. Per. Volume

Datanode uses this value as the maximum number of threads that each volume uses to cache new blocks.

By default, this value is set to 4.

3. dfs. cachereport. intervalmsec

Datanode uses this value as the interval between two completely cached reports sent to namenode.

The default value is 10000, that is, 10 seconds.

4. dfs. namenode. Path. Based. cache. Block. Map. allocation. percent

The percentage of Java heap we requested to cache blocks. This map is a hashmap. If the number of cached blocks is too large, a relatively small map may be slow to access, and a large map will consume more memory (the hashmap mechanism ). The default value is 0.25.

OS limits

If you get an error, cannot start datanode because the configured maxlocked memory size... is more than the datanode's available rlimit_memlockulimit, which means that the operating system sets a limit lower than the configured value in the total memory of your lock. To fix this error, you must adjust the ulimit-L value of the machine running datanode. Normally, this value/Etc/security/limits. conf. However, the operating system and the release version will change.

When you run the ulimit-l command in shell, you will know whether you have set the new value correctly. This command should return a value higher than the value you set in DFS. datanode. Max. Locked. memory, or the string "ulimited", which means there is no limit. Note that the output of ulimit-L is usually kb, but DFS. datanode. Max. Locked. memory must be set as byte.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Centralized Cache Management in HDFS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Centralized Cache Management in HDFS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support