HDFS Data Integrity

Last Update:2015-06-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HDFS Data Integrity

To ensure data integrity, data verification technology is generally used:
1. Parity Technology
2. md5, sha1, and other verification technologies
3. Cyclic Redundancy verification technology of CRC-32
4. ECC memory error correction and Verification Technology

HDFS Data Integrity
1. HDFS verifies all written data transparently. bytes. per. set the checksum attribute. The default number of bytes is 512 bytes. Create a separate checksum. If the node detects data errors, A CheckSumException is reported.
2. In addition to verifying when reading data, the data node will also run a thread in the background
Datablocksecond (data block detection program) Periodically verifies all blocks stored on the data node.
3. Once corruptblock is detected, in the heartbeat phase, the DN will receive the Block Command from NN and copy a new replica (Backup Block) from other data blocks ).

Local File System
If you use the local file system file: //, when writing a file, a file. crc file is created implicitly, including the checksum of each data block.
Use FileSystem. setVerifyChecksum (false) to disable checksum verification. You can also use the-ignoreCrc option in shell commands.
You can also use RawLocalFilesystem to enable verification,
1. Set fs. file. impl to org. apache. Hadoop. fs and RawLocalFileSystem.
2. Create an instance

LocalFileSystem inherits from ChecksumFileSystem,
ChecksumFileSystem provides the file verification system function.

Package org. apache. hadoop. fs;

/*************************************** *******************
Implement the FileSystem API for the raw local filesystem.
**************************************** *********************/
Public class RawLocalFileSystem extends FileSystem {
}

Public abstract class ChecksumFileSystem extends FilterFileSystem {
}

Public class LocalFileSystem extends ChecksumFileSystem {
}

Verify whether the file. crc file exists.

Package Compress;

Import java. io. IOException;

Import org. apache. hadoop. conf. Configuration;
Import org. apache. hadoop. fs. ChecksumFileSystem;
Import org. apache. hadoop. fs. LocalFileSystem;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. mapreduce. Job;

Public class CheckpointFileSystem {
Public static void main (String [] args) throws IOException {
Configuration conf = new Configuration ();
Job job = Job. getInstance (conf, "DeCodec ");
// Methods required for running the package
Job. setJarByClass (CheckpointFileSystem. class );
LocalFileSystem localFileSystem = ChecksumFileSystem. getLocal (conf );
System. out. println (
LocalFileSystem. getChecksumFile (new Path ("/liguodong/data ")));
}
}

[Root @ master liguodong] # yarn jar checksum. jar
/Liguodong/. data. crc

How does Hadoop modify the size of HDFS file storage blocks?

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HDFS Data Integrity

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HDFS Data Integrity

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support