Big Data Application Security: Hadoop, IBM, Microsoft

Source: Internet
Author: User
Tags big data data encryption data transfer big data security data security practice

Hadoop big data security practice

Currently, the Hadoop-based big data open source ecosystem is widely used. At the earliest, Hadoop was considered to be deployed only in a trusted environment, and as more departments and users joined, any user could access and delete data, putting data at great security risk. In addition, for the internal network environment and data destruction process management and control, in the context of big data, if the corresponding security control measures are not taken, it is also prone to major data leakage accidents.

In order to meet the above security challenges, in 2009, the Hadoop open source community began to focus on protecting big data security, and has added important security functions such as authentication, access control, data encryption and log auditing.

Authentication is the process of confirming the identity of a visitor and is the basis of data access control. In terms of authentication, Hadoop big data open source software uses Kerberos as the only optional strong security authentication method, and builds a secure big data access control environment based on it. Based on the results of authentication, Hadoop uses various access control mechanisms to control data access at different system levels.

HDFS (Hadoop Distributed File System) provides POSIX permissions and access control lists. Hive (data warehousing) provides role-based access control, and HBase (distributed database) provides access control lists and tag-based access. control. As the main means to protect data security and avoid data leakage, data encryption is widely used in big data application systems to effectively prevent data leakage caused by improper network sniffing or physical storage media destruction.

For data transfer, Hadoop provides encryption options for various data transfers, including encrypting data transfers between clients and service processes and between service processes. At the same time, Hadoop also provides data encryption at the storage layer to ensure that data is stored on the hard disk in encrypted form. Finally, the components of the Hadoop ecosystem provide log and audit file records for data access, providing the basis for tracking data flow, optimizing data processes, and discovering violations.

Based on the above-mentioned series of security mechanisms, Hadoop basically built a big data open source environment that meets the requirements of basic security functions. Kerberos is widely adopted as a de facto strong security authentication method. However, since Kerberos uses symmetric key algorithm to implement two-way authentication, it may bring deployment and management challenges when deploying Kerberos-based distributed authentication systems on a large scale. A common solution is to use a tool provided by a third party to simplify the deployment and management process.

In terms of access control, the complexity of access control in big data environments is not only in the form of access control, but also in the fact that big data systems allow data to be widely shared at different system levels, and a centralized and unified access control is needed to simplify control strategies and deployment. . In terms of data encryption, hardware-based encryption schemes can greatly improve data encryption and decryption performance, achieving end-to-end and storage layer encryption with minimal performance loss.

However, the effective use of encryption requires secure and flexible key management, and the open source solution is still weak, requiring commercial key management products. Log auditing is indispensable as an important measure for data management, data traceability, and attack detection. However, open source systems such as Hadoop only provide basic log and audit records, which are stored on each cluster node. If you want to centrally manage and analyze your logs and audit records, you still need to rely on third-party tools.

IBM big data security practice

IBM Security Guardium is a complete data security platform that provides a complete set of capabilities such as discovery and classification of sensitive data, grading, security assessment, data and file activity detection, and sensitivity through masquerading, blocking, alerting, and isolation. data.

Not only does Guardium protect the database, it is also extended to protect data warehouses, ECMs, file systems, and big data environments. In addition to the security platform, the IBM architecture provides the practice of building application on the cloud. IBM developed the Customer Cloud Architecture for Big Data Analytics and Security, which was released as a reference architecture and industry standard at CSCC, which describes vendor-neutral best practices and all the components that make up this architecture using cloud computing to host big data analytics solutions. The details of the component. All components of this reference architecture can be implemented using open source technology.

IBM Security Reference Architecture and Data Security

As shown in Figure B-12, the IBM Security Reference Architecture provides an overview of the security components that protect deployment, development, and operations on the cloud.

When it comes to data security, we usually need to distinguish between static data and dynamic data. Data security aims to discover, classify, and protect cloud and information assets, with a focus on the protection of static and dynamic data.

The IBM Data Security Architecture includes all data types, such as traditional enterprise data and any form of data (structured and unstructured) in a big data environment. The IBM Data Security Architecture encompasses the various modules required for data security based on governance, risk, and compliance. The following summarizes the key modules related to data security that need to be considered in cloud computing solutions.

Data protection

A complete cloud computing data protection solution needs to consider offering the following service options to customers:

  • Static data encryption in a cloud environment

  • Storage block and file storage encryption service

  • Object storage encryption using IBM Cleversafe

  • Data Encryption Service using IBM Cloud Data Encryption Services (ICDES)

  • Cloud-based hardware security module (HSM)

  • Key management and certificate management using IBM Key Project

For each of the above service options, a specific set of processes, control plans and implementation strategies need to be developed for implementation.

Data integrity

Data integrity is designed to maintain and guarantee the accuracy and consistency of data throughout its lifecycle. In the context of this article, data integrity refers to how to prevent data from being tampered with. The hash value of the data can be used to detect if the data has been illegally tampered with. This method can be used to protect static data and dynamic data.

Data classification and data activity monitoring

Data classification is an effective way to help protect critical information. Before protecting sensitive information, it must be identified and identified. Automated discovery and classification processes are key components of a data protection strategy that prevents leakage of sensitive information. Guardium provides integrated data classification capabilities and a seamless approach to discovering, identifying and protecting the most critical data, whether in the cloud or in the data center.

Guardium also provides data activity monitoring, as well as cognitive analysis to discover anomalous activity against sensitive data, prevent unauthorized data access, provide alerts for suspicious activity, automate compliance processes, and protect against internal and external attacks.

Data privacy and laws and regulations

Data privacy determines how information (especially personally relevant information) is collected, used, shared, and disposed of within the scope of relevant policies and laws and regulations.

According to IBM's policy, every cloud service needs to implement technical and organizational security and privacy protection measures. These measures are based on the architecture, purpose of use, and type of service of the cloud service. Regardless of the type of service, IBM's specific management responsibilities for each cloud service are listed in the relevant agreements.

IBM big data smart security

IBM Big Data Smart Security combines the real-time security association and anomaly discovery capabilities of the IBMQRadar Intelligent Security Platform with the ability to forensic evidence, and the ability to analyze and discover large-scale structured and unstructured data, including customized, large-scale structured data provided by BigInsights .

Microsoft big data security practice

HDInsight is Microsoft's big data service running on Micsoroft Azure. Azure HDInsight deploys and sets up an Apache Hadoop cluster in a cloud, providing a software framework designed to manage, analyze, and report on big data.

Microsoft's big data service Azure HDInsight supports a variety of data technologies, including the basic Hadoop distributed file system HDFS, hyper-table non-relational database HBase, SQL-like query Hive, distributed processing and resource management MapReduce and YARN, etc. 

HDInsight is part of Azure Cloud Services, which provides security in several ways, including:

Use AzureBlob storage

AzureBlob storage is a Hadoop-compatible option and is a robust, versatile storage solution that integrates seamlessly with HDInsight. Through Hadoop's distributed file system HDFS interface, you can run the entire set of components in HDInsight directly against structured or unstructured data in Blob storage. By storing data in Blob storage, you can safely delete HDInsight clusters for computing without losing user data.

Key vault

Secure key management is essential to protecting data in the cloud. With Azure Key Vault, you can encrypt keys and small cipher text passwords by using keys stored in the Hardware Security Module (HSM). To increase security, you can import or generate a key in HSM. If you choose to do so, Microsoft will use the FIPS 140-2 Level 2 certified HSM to process the user's key.

Key Vault is designed to ensure that Microsoft does not see or extract the user's key. Monitor and audit key usage with Azure Logging—deliver logs to Azure HDInsight or SIEM for additional analysis and threat detection.

Multiple authentication

Azure Multi-Factor Authentication is a way to verify the identity of a user in a variety of ways, not just usernames and passwords. It provides an additional layer of security for user logins and transactions. Azure Multi-Factor Authentication helps protect access to data and applications while meeting the user's need for a simple login process. It provides powerful authentication through a variety of simple authentication options such as phone, SMS, mobile app notifications or verification codes.

Azure Active Directory (Azure AD)

Azure AD is a multi-tenant cloud-based directory and identity management service from Microsoft. AzureAD includes a full suite of identity management features such as multi-factor authentication, device registration, self-service password management, self-service group management, privileged account management, role-based access control, application usage monitoring, diversified auditing, and security monitoring and alerting.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.