Ten measures to guarantee the security of Hadoop data

Source: Internet
Author: User
Keywords Data security ten sure can

Dataguise recently released the top ten Hadoop data security measures, covering privacy risks, data management, and information security, which can help professionals reduce the risk of potential data leaks and policy breaches in large data applications, and are well worth the reference for companies considering the deployment of Hadoop.

Dataguise provides Hadoop security services for several Fortune 200 companies, and concludes a set of large data security practices and processes that are suitable for large-scale and diverse environments.

Large data analysis has always been accompanied by privacy issues and disputes, in large data analysis of the vast number of data, will inevitably appear names, addresses and identity numbers, such as personal privacy information PII (Personally identifiable information).

And a lot of financial data like credit cards and bank account numbers will inevitably carry the above-mentioned personal information, access to these data will cause great controversy. But most privacy issues can be mitigated by careful planning, testing, production preparation, and rational application of large data technologies.

The following are the best security practices for the Hadoop project implemented by Dataguise, especially for the initial planning phase of the project:

1. The sooner the data privacy measures are better. Clear the data privacy policy at the planning stage, preferably before you import the data into Hadoop.

2. Identify which data elements in your business belong to sensitive data. Take full account of the company's privacy policy, relevant industry regulations and government regulations.

3. Whether sensitive data is hidden/entrained in the process of analyzing the environment and assembling the Hadoop system.

4. Collect enough information to identify compliance risks.

5. Clear whether the business analysis needs to access the real data, or "desensitization" data can be used. Then select the appropriate corrective techniques (masking or encryption) for sensitive information occlusion and encryption. Occlusion (masking) technology provides the best security performance, while encryption is more flexible and depends on future needs.

6. Ensure that the data protection scheme supports both occlusion and encryption of data correction techniques, especially when it is necessary to store the occluded and uncensored two versions of data in different Hadoop directories.

7. Ensure that data protection technologies provide a consistent masking approach to all data files, thus ensuring the accuracy of the analysis on the various data aggregation dimensions.

8. To determine whether a particular dataset requires a customized protection scheme, consider dividing the Hadoop directory into smaller groups for the needs of data unit security management.

9. Ensure that the encryption scheme you choose is interoperable with the enterprise access control technology, so that a specific level and identity user can access only the specific range of data in the Hadoop cluster.

10. When encryption technology is needed, ensure that appropriate technologies (Java, pig, etc.) are deployed to achieve seamless encryption while ensuring accessibility to data.

By early initiation and establishment of sensitive data scenarios, enterprises can discover sensitive data in Hadoop environment as early as possible, analyze compliance risk and adopt data protection technology reasonably, which can not only reduce data leakage and compliance risk greatly, but also improve investment return of large data project.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.