9 Tips for protecting large data security in Hadoop

Source: Internet
Author: User
Keywords Security security these security these large data security security these large data security can security these large data security can or

When an enterprise is transformed into a data-driven machine, its potential is enormous: the data owned by the enterprise may be the key to gaining a competitive advantage. As a result, the security of the enterprise's data and infrastructure has become more important than ever.

In many cases, companies or organizations may get what Forrester says is "toxic data". For example, a wireless company is collecting data about who is logged on to which antenna tower, how long they stay online, how much data they use, whether they are moving or stationary, and that data is available to understand the user's behavior.

The wireless company may have many user-generated data: credit card numbers, social Security numbers, buying custom data, and the way users use any information. The ability to correlate such data and extract inferences from these data is valuable, but it is also detrimental if the associated data is leaked to the outside of the institution and falls into the hands of others, which can cause catastrophic damage to individuals and institutions.

Apply large data and don't forget compliance and control. Here are 9 techniques for securing large data.

1. Consider security issues before starting large data projects. You should not wait for data-breakthrough events to take measures to ensure data security. The organization's IT security team and others who participate in large data projects should carefully discuss security issues before installing and sending large data to the distributed Computing (HADOOP) cluster.

2. Consider what data to store. You may need to comply with specific security requirements when you plan to use Hadoop to store and run data to be submitted to the regulatory department. Even if the data stored is not regulated by the regulatory authorities, risk is assessed, and if data such as personal identity information is lost, the risk will include loss of credibility and loss of revenue.

3. Concentration of responsibility. Now, enterprise data may exist in silos and datasets in multiple institutions. The responsibility for centralized data security ensures consistent policy and access control in all of these silos.

4. Encrypt static and dynamic data. Increase transparent data encryption at the file layer. SSL (Secure Sockets Layer) encryption protects large data when data is moved between nodes and applications. Adrian Leine Adrian Lane, chief technology officer and analyst at Securosis, a security research and consultancy firm, says file encryption solves two types of attacks that bypass normal application security controls. Encryption can be protective when a malicious user or administrator obtains access to the data node and directly checks the file's permissions and may steal files or unreadable disk mirrors. This is a cost-effective way to address some of the data security threats.

5. Separate the key from the encrypted data. Store the key of the encrypted data in the same server as the encrypted data and then lock the door and hang the key on the lock. The Key management system allows the organization to securely store encryption keys, isolating the key from the data to be protected.

6. Use the Kerberos Network Identification protocol. Businesses need to be able to manage who and what processes can access data stored in Hadoop. This is an effective way to avoid rogue nodes and applications into the cluster. Ryan says this can help protect network control access, making the management function difficult to breach. We know that setting Kerberos is difficult, and verifying or validating new nodes and applications can be useful. However, without a two-way trust, spoofing Hadoop allows malicious applications to enter this cluster, or accept the introduction of malicious nodes is easy. This malicious node can later add, modify, or extract data. The Kerberos protocol is the most effective security control measure that can be controlled. Kerberos is built in the Hadoop infrastructure, so please use it.

7. Use Security automation. The enterprise is dealing with a multi-node environment, so deployment consistency is difficult to guarantee. Automation tools such as chef and Puppet can help organizations better use patches, configure applications, update the Hadoop stack, gather reliable machine mirroring, certificates, and platform inconsistencies. It takes some time to build these scripts beforehand, but it will later be rewarded with reduced management time and additional assurance that each node has basic security.

8. Add records to the Hadoop cluster. Large data is naturally appropriate for collecting and managing record data. Many web companies start using large data to manage record files. Why not add records to existing clusters? This will allow companies to see when a failure occurs or if anyone thinks the company has been compromised by a hacker. Without an event track record, you are a blind man. Recording Mr Requests and other cluster activities is easy and can improve storage and processing requirements slightly. However, these data are indispensable when necessary.

9. Secure communication is used between nodes and between nodes and applications. To do this, you need to deploy a TLS (Secure Sockets Layer/Transport Layer Security) Protocol to protect all network traffic for your enterprise, rather than just protecting one subnet. Like many cloud service providers, Hadoop providers such as Cloudera are already doing this. If this capability is not set, you need to integrate these services into the application stack. (Compiled/Populus euphratica)

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.