The concept of "security and privacy" has been around since its inception.
In this era of data explosion, companies are able to see and anticipate consumer behavior and industry trends more efficiently through large data, but they are also associated with security problems. Whether to protect their privacy security, information security, has become the enterprise deployment of large data before the first difficult problem.
Hadoop, as the preferred platform for large data, has been shrouded in security issues since the beginning of development.
Developers have pointed out that "when Hadoop started developing in 2004, it did not take into account how to create a secure distributed computing environment, and the Hadoop framework was severely inadequate for authenticating and authorizing users and services, and users could counterfeit any HDFs and mapreduce users, Malicious code can be submitted to job Tracker by any user. He also mentioned that by the year 2009, discussions about the security of Hadoop had come to a close, and security had been raised as a high-priority issue.
Although Hadoop performs very efficiently in data aggregation and data analysis in the face of multiple data sources, it cannot ignore the security risks associated with it.
Foreign experts point out that Hadoop is flexible in coping with the massive data analysis of the enterprise. But we can't ignore a series of new problems that Hadoop introduces, including security, data access, data monitoring, HA, business data continuity and so on. These problems are the problems that enterprises must face.
In the opinion of Yiu Xiang, general manager of Hewlett-Packard Company's Safety Products department, Asia, big data is now a big trend, and it is expected that in the next five years, big data will grow by an average of 51% in the Chinese market. Among them, security must be an unavoidable topic. While large data is still in its infancy, it is important to have security considerations in building the database and data center cloud, and if there is no security, building a larger business system may eventually be costly.
Although the industry is generally concerned about safety issues, it is often taken into account after implementation or in the implementation process.
In fact, before you start a large data project, you need to consider security issues before you can take precautions. You should not wait for data-breakthrough events to take measures to ensure data security.
Security issues for large data should be considered before deployment
The analysis agency Dataguise recently released the top ten data security measures for Hadoop, the first of which points out that the sooner the data privacy measures are better. According to the analysis, the enterprise can discover the sensitive data in the Hadoop environment as early as possible, analyze the compliance risk and adopt the data protection technology, which can not only reduce the risk of data leakage and compliance, but also improve the investment return of large data project.
The following are the top ten data security measures listed in Dataguise:
1. The sooner the data privacy measures are better. Clear the data privacy policy at the planning stage, preferably before you import the data into Hadoop.
2. Identify which data elements in your business belong to sensitive data. Take full account of the company's privacy policy, relevant industry regulations and government regulations.
3. Whether sensitive data is hidden/entrained in the process of analyzing the environment and assembling the Hadoop system.
4. Collect enough information to identify compliance risks.
5. Clear whether the business analysis needs to access the real data, or "desensitization" data can be used. Then select the appropriate corrective techniques (masking or encryption) for sensitive information occlusion and encryption. Occlusion (masking) technology provides the best security performance, while encryption is more flexible and depends on future needs.
6. Ensure that the data protection scheme supports both occlusion and encryption of data correction techniques, especially when it is necessary to store the occluded and uncensored two versions of data in different Hadoop directories.
7. Ensure that data protection technologies provide a consistent masking approach to all data files, thus ensuring the accuracy of the analysis on the various data aggregation dimensions.
8. To determine whether a particular dataset requires a customized protection scheme, consider dividing the Hadoop directory into smaller groups for the needs of data unit security management.
9. Ensure that the encryption scheme you choose is interoperable with the enterprise access control technology, so that a specific level and identity user can access only the specific range of data in the Hadoop cluster.
10. When encryption technology is needed, ensure that appropriate technologies (Java, pig, etc.) are deployed to achieve seamless encryption while ensuring accessibility to data.
(Responsible editor: The good of the Legacy)