HDFS Data Integrity

Source: Internet
Author: User

HDFS Data Integrity

To ensure data integrity, data verification technology is generally used:
1. Parity Technology
2. md5, sha1, and other verification technologies
3. Cyclic Redundancy verification technology of CRC-32
4. ECC memory error correction and Verification Technology

HDFS Data Integrity
1. HDFS verifies all written data transparently. bytes. per. set the checksum attribute. The default number of bytes is 512 bytes. Create a separate checksum. If the node detects data errors, A CheckSumException is reported.
2. In addition to verifying when reading data, the data node will also run a thread in the background
Datablocksecond (data block detection program) Periodically verifies all blocks stored on the data node.
3. Once corruptblock is detected, in the heartbeat phase, the DN will receive the Block Command from NN and copy a new replica (Backup Block) from other data blocks ).

Local File System
If you use the local file system file: //, when writing a file, a file. crc file is created implicitly, including the checksum of each data block.
Use FileSystem. setVerifyChecksum (false) to disable checksum verification. You can also use the-ignoreCrc option in shell commands.
You can also use RawLocalFilesystem to enable verification,
1. Set fs. file. impl to org. apache. Hadoop. fs and RawLocalFileSystem.
2. Create an instance

LocalFileSystem inherits from ChecksumFileSystem,
ChecksumFileSystem provides the file verification system function.

Package org. apache. hadoop. fs;

/*************************************** *******************
Implement the FileSystem API for the raw local filesystem.
**************************************** *********************/
Public class RawLocalFileSystem extends FileSystem {
}

Public abstract class ChecksumFileSystem extends FilterFileSystem {
}

Public class LocalFileSystem extends ChecksumFileSystem {
}

Verify whether the file. crc file exists.

Package Compress;

Import java. io. IOException;

Import org. apache. hadoop. conf. Configuration;
Import org. apache. hadoop. fs. ChecksumFileSystem;
Import org. apache. hadoop. fs. LocalFileSystem;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. mapreduce. Job;

Public class CheckpointFileSystem {
Public static void main (String [] args) throws IOException {
Configuration conf = new Configuration ();
Job job = Job. getInstance (conf, "DeCodec ");
// Methods required for running the package
Job. setJarByClass (CheckpointFileSystem. class );
LocalFileSystem localFileSystem = ChecksumFileSystem. getLocal (conf );
System. out. println (
LocalFileSystem. getChecksumFile (new Path ("/liguodong/data ")));
}
}

[Root @ master liguodong] # yarn jar checksum. jar
/Liguodong/. data. crc

How does Hadoop modify the size of HDFS file storage blocks?

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.