HDFS Data Encryption space-encryption Zone

Source: Internet
Author: User
Tags hadoop fs
Preface

I have written many articles about data migration and introduced many tools and features related to HDFS, suchDistcp, viewfilesystemAnd so on. But the theme I want to talk about today has moved to another field.Data securityData security has always been a key concern for users. Therefore, data managers must follow the following principles:

The data is not lost or damaged, and the data content cannot be accessed illegally.

The main aspect described in this article is the last point in the above principles,Prevent unauthorized access to dataIn HDFS, there are dedicated functions to do this,Encryption zone, data encryption Space,

Encryption zone Overview

HDFS encryption zone is an end-to-end encryption mode. The encryption/decryption process is completely transparent to the client.Data is decrypted during client read operations and encrypted when the data is written by the clientTherefore, HDFS itself is not a major participant,In HDFS, all you see is a bunch of encrypted data streams..

How encryption zone works

Understanding the principles of HDFS Data Encryption space helps us to use encryption zone.Encryption zone is an abstract concept in HDFS. It indicates that the content in this space will be transparently encrypted during writing and transparently decrypted during reading.. This is the core, specific to small details.

  • 1. Each encryption zone will be associated with each encryption zone key, which will be specified when the encryption zone is created.
  • 2. Each file in the encryption zone has its unique data encryption key, which is short for Dek.
  • 3. Dek is not directly processed by HDFS. Instead, HDFS only processes the encrypted Dek, which is the encrypted data encryption key, which is abbreviated as edek.
  • 4. The client asks the kms service to decrypt the edek, and then uses the decrypted Dek to read/write data.

There is an important process in Step 4:

When the client sends a request to the kms service, it will have relevant permission verification. Clients that do not meet the requirements will not be decrypted. KMS permission verification is independent of HDFS and is a set of permission verification.

The following figure shows the corresponding principle:

The key provider can be understood as a key store., WhereKmsIs an implementation.

Encryption zone source code implementation

In this section, we will track and analyze the above principles from the source code layer. There are two main directions: 1 is to create a file and Write File data; 2. Read File data content.

Write File in encryption Zone

First, the client initiates the createfile request. On the namenode side, the startfile method is called, and the Dek and edek are generated.

Private hdfsfilestatus startfileint (final string SRC, permissionstatus permissions, string holder, string clientmachine, enumset <createflag> flag, Boolean createparent, short replication, long blocksize, interval [] blocks, Boolean logretrycache) throws ioexception {... fsdirwritefileop. encryptionkeyinfo ezinfo = NULL; // determines whether the key provider is null if (provider! = NULL) {readlock (); try {checkoperation (operationcategory. read); // if it is not null, encryptionkey info is generated. ezinfo = fsdirwritefileop. getencryptionkeyinfo (this, PC, SRC, supportedversions);} finally {readunlock ();} // generate edek if necessary while not holding the lock if (ezinfo! = NULL) {// then generate the edek information based on the ezinfo key name in ezinfo. edek = fsdirencryptionzoneop. generateencrypteddataencryptionkey (Dir, ezinfo. ezkeyname);} encryptionfaultinjector. getinstance (). startfileaftergeneratekey ();}... try {// continue to call the startfile method stat = fsdirwritefileop. startfile (this, PC, SRC, permissions, holder, clientmachine, flag, createparent, replication, blocksize, ezinfo, toremoveblocks, logretrycache );...

Continue method call

Static hdfsfilestatus startfile (effecfsn, fspermissionchecker PC, string SRC, permissionstatus permissions, string holder, string clientmachine, enumset <createflag> flag, Boolean createparent, short Replication, inode. blocksmapupdateinfo toremoveblocks, Boolean logretryentry) throws ioexception {assert FSN. haswritelock ();... ciphersuite suite = N Ull; cryptoprotocolversion version = NULL; keyprovidercryptoextension. encryptedkeyversion edek = NULL; // retrieve the key information in ezinfo if (ezinfo! = NULL) {edek = ezinfo. edek; suite = ezinfo. suite; version = ezinfo. protocolversion ;}... fileencryptioninfo feinfo = NULL; Final encryptionzone zone = fsdirencryptionzoneop. getezforpath (FSD, IIP); If (zone! = NULL) {// The path is now within an EZ, But we're re missing encryption parameters if (suite = NULL | edek = NULL) {Throw new retrystartfileexception ();} // path is within an ez and we have provided encryption parameters. // make sure that the generated edek matches the settings of the EZ. final string ezkeyname = zone. getkeyname (); If (! Ezkeyname. equals (edek. getencryptionkeyname () {Throw new retrystartfileexception ();} // input to fileencryptioninfo. feinfo will be set to the inode file feinfo = new fileencryptioninfo (suite, version, edek. getencryptedkeyversion (). getmaterial (), edek. getencryptedkeyiv (), ezkeyname, edek. getencryptionkeyversionname ());}...

OK. After these operations are completed,HdfsfilestatusObject, this object will beDfsoutputstreamThe following figure shows how the client decrypts dedk and encrypts the data.

Public hdfsdataoutputstream streams (dfsoutputstream dfsos, filesystem. Statistics, long startpos) throws ioexception {// retrieve the final fileencryptioninfo feinfo = dfsos. Encrypt (); If (feinfo! = NULL) {// The file is encrypted and needs to be packaged as an encrypted stream. // file is encrypted, wrap the stream in a crypto stream. // currently only one version, so no special logic based on the version # getcryptoprotocolversion (feinfo); Final cryptocodec codec = getcryptocodec (Conf, feinfo ); // decrypt the edek information in feinfo. The keyversion decrypted = decryptencrypteddataencryptionkey (feinfo) is requested to the kerprovider. // the decrypted information is used as the parameter, construct the encrypted output stream final cryptooutputstream cryptoout = new cryptooutputstream (dfsos, codec, decrypted. getmaterial (), feinfo. getiv (), startpos); return New hdfsdataoutputstream (cryptoout, statistics, startpos);} else {// No fileencryptioninfo present so no encryption. return new hdfsdataoutputstream (dfsos, statistics, startpos );}}

We can continue the decryptencrypteddataencryptionkey method to verify whether the service has been requested from the keyprovider method.

Private keyversion decryptencrypteddataencryptionkey (fileencryptioninfo feinfo) throws ioexception {try (tracpolicignored = tracer. newscope ("decryptedek") {// get keyprovider provider = getkeyprovider (); If (provider = NULL) {Throw new ioexception ("No keyprovider is configured, cannot access "+" an encrypted file ");} // obtain the Encrypted Key version encryptedkeyversion ekv = encryptedkeyversion. createfordecryption (feinfo. getkeyname (), feinfo. getezkeyversionname (), feinfo. getiv (), feinfo. getencrypteddataencryptionkey (); try {keyprovidercryptoextension cryptoprovider = keyprovidercryptoextension. createkeyprovidercryptoextension (provider); // return cryptoprovider for decryption. decryptencryptedkey (ekv);} catch (generalsecurityexception e) {Throw new ioexception (e );}}}

After the encrypted output stream object is constructed, after cryptooutputstream, in subsequent write operations, the data will go through an additional step of encryption algorithm operations.. The procedure call diagram is as follows:

Read files in encryption Zone

Reading a file is similar to writing a file.

First, construct the hdfsfilestatus object of the target file, and then retrieve the fileencryptioninfo. In this process, fileencryptioninfo is set to locatedblocks.

Private Static hdfslocatedfilestatus createlocatedfilestatus (fsdirectory FSD, byte [] path, inodeattributes nodeattrs, byte storagepolicy, int snapshot, Boolean israwpath, inodesinpath IIP) throws ioexception {... // configure loc = FSD in locatedblocks. getblockmanager (). createlocatedblocks (filenode. getblocks (snapshot), filesize, isuc, 0l, size, false, insnapshot, feinfo, ecpolicy );...

Then, the blocks information will be transmitted to dfsinputstream as the parameter information, andFetchlocatedblocksandgetlastblocklengthSet to variable.

Private long fetchlocatedblocksandgetlastblocklength (Boolean refresh) throws ioexception {locatedblocks newinfo = locatedblocks ;... // set encryptioninfo in locatedblocks to fileencryptioninfo = locatedblocks in the variable. getfileencryptioninfo (); Return lastblockbeingwrittenlength ;}

This information will also be retrieved and used in the encrypted input stream.

Public hdfsdatainputstream createwrappedinputstream (dfsinputstream dfsis) throws ioexception {// get file encryption information final fileencryptioninfo feinfo = dfsis. getfileencryptioninfo (); If (feinfo! = NULL) {// file is encrypted, wrap the stream in a crypto stream. // currently only one version, so no special logic based on the version # getcryptoprotocolversion (feinfo); Final cryptocodec codec = getcryptocodec (Conf, feinfo ); // decrypt dedk final keyversion decrypted = decryptencrypteddataencryptionkey (feinfo); // construct the encrypted input stream final cryptoinputstream cryptoin = new cryptoinputstream (dfsis, codec, decrypted. getmaterial (), feinfo. getiv (); return New hdfsdatainputstream (cryptoin);} else {// No fileencryptioninfo so no encryption. return new hdfsdatainputstream (dfsis );}}

Similar to the previous process,In the encrypted input stream, the read data is decrypted so that you can see normal data.. The process diagram is also given:

Encryption zone management

At the end of the source code analysis, we will briefly describe the central management of the HDFS encryption zone.It is also a class called encryptionzonemanager to specifically do this, but there is a difference that the object he saves is not encryptionzone, but encryptionzoneint.

Public class encryptionzonemanager {public static logger log = loggerfactory. getlogger (encryptionzonemanager. class );... // use treemap to save the encryption Zone List private final treemap <long, encryptionzoneint> encryptionzones; private final fsdirectory dir; private final int maxlistencryptionzonesresponses ;...

Here, the key location of the treemap is saved in the indeed directory of the encryption zone.
What is the subtle relationship between encryptionzoneint and encryptionzone?

Encryptionzoneint is used to construct the encryptionzone.

The following code

  EncryptionZone getEZINodeForPath(INodesInPath iip) {    final EncryptionZoneInt ezi = getEncryptionZoneForPath(iip);    if (ezi == null) {      return null;    } else {      return new EncryptionZone(ezi.getINodeId(), getFullPathName(ezi),          ezi.getSuite(), ezi.getVersion(), ezi.getKeyName());    }  }

Determine whether the target path is in the encryption zone list to determine whether the file is an encrypted file and take the inodeid as the key to the map.

The following shows the structure of encryption zone management:

Use of encryption Zone

Finally, we will introduce the specific configuration and usage of the following encryption zone function. In general, you need to complete several related configuration items.

Step 1: complete keyproveider Configuration

Configure the URL of the existing keyprovider to the following Configuration

dfs.encryption.key.provider.uri
Step 2: encryption algorithm Configuration

The following configurations are available:

hadoop.security.crypto.codec.classes.EXAMPLECIPHERSUITEhadoop.security.crypto.codec.classes.aes.ctr.nopaddinghadoop.security.crypto.cipher.suitehadoop.security.crypto.jce.providerhadoop.security.crypto.buffer.size

Of course, these configurations do not require additional configuration, so it is also possible to use the default configuration.

Step 3: configure the number of listzone response replies

This configuration will play a role in the listzones command.

dfs.namenode.list.encryption.zones.num.responses
Step 4: Create an encryption zone encrypted Space

The encrypted space is directory-level, and a key name needs to be set. The command is as follows:

hdfs crypto -createZone -keyName <keyName> -path <path>

The path here is a directory to be created. The function of this command is to use the target directory as an encrypted space. The files under this directory are encrypted/decrypted during reading and writing.

After completing the preceding operations, the encrypted space is basically created. You can use the listzones command to view the currently created encrypted space.

hdfs crypto -listZones

Then the file data encryption and decryption process in this directory is completely transparent to the client..

Example of encryption Zone

The following is an official example.

# Create an encrypted keyhadoop key create mykey as a normal user # create an empty directory as a Super User, and make it into the encrypted space hadoop FS-mkdir/zonehdfs crypto-createzone-keyname mykey-path/zone # modify the directory permission to hadoop FS-chown myuser for normal users: myuser/zone # Put file upload and CAT File Viewing operations as normal users hadoop FS-put helloworld/zonehadoop FS-CAT/zone/helloworld
Reference

1. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html

HDFS Data Encryption space-encryption Zone

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.