HDFS Data Integrity
To ensure data integrity, data verification technology is generally used:
1. Parity Technology
2. md5, sha1, and other verification technologies
3. Cyclic Redundancy verification technology of CRC-32
4. ECC memory error correction and Verification Technology
HDFS Data Integrity
1. HDFS verifies all written data transparently. bytes. per. set the checksum attribute. The default number of bytes is 512 bytes. Create a separate checksum. If the node detects data errors, A CheckSumException is reported.
2. In addition to verifying when reading data, the data node will also run a thread in the background
Datablocksecond (data block detection program) Periodically verifies all blocks stored on the data node.
3. Once corruptblock is detected, in the heartbeat phase, the DN will receive the Block Command from NN and copy a new replica (Backup Block) from other data blocks ).
Local File System
If you use the local file system file: //, when writing a file, a file. crc file is created implicitly, including the checksum of each data block.
Use FileSystem. setVerifyChecksum (false) to disable checksum verification. You can also use the-ignoreCrc option in shell commands.
You can also use RawLocalFilesystem to enable verification,
1. Set fs. file. impl to org. apache. Hadoop. fs and RawLocalFileSystem.
2. Create an instance
LocalFileSystem inherits from ChecksumFileSystem,
ChecksumFileSystem provides the file verification system function.
Package org. apache. hadoop. fs;
/*************************************** *******************
Implement the FileSystem API for the raw local filesystem.
**************************************** *********************/
Public class RawLocalFileSystem extends FileSystem {
}
Public abstract class ChecksumFileSystem extends FilterFileSystem {
}
Public class LocalFileSystem extends ChecksumFileSystem {
}
Verify whether the file. crc file exists.
Package Compress;
Import java. io. IOException;
Import org. apache. hadoop. conf. Configuration;
Import org. apache. hadoop. fs. ChecksumFileSystem;
Import org. apache. hadoop. fs. LocalFileSystem;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. mapreduce. Job;
Public class CheckpointFileSystem {
Public static void main (String [] args) throws IOException {
Configuration conf = new Configuration ();
Job job = Job. getInstance (conf, "DeCodec ");
// Methods required for running the package
Job. setJarByClass (CheckpointFileSystem. class );
LocalFileSystem localFileSystem = ChecksumFileSystem. getLocal (conf );
System. out. println (
LocalFileSystem. getChecksumFile (new Path ("/liguodong/data ")));
}
}
[Root @ master liguodong] # yarn jar checksum. jar
/Liguodong/. data. crc
How does Hadoop modify the size of HDFS file storage blocks?
Copy local files to HDFS
Download files from HDFS to local
Upload local files to HDFS
Common commands for HDFS basic files
Introduction to HDFS and MapReduce nodes in Hadoop
Hadoop practice Chinese version + English version + Source Code [PDF]
Hadoop: The Definitive Guide (PDF]