Although large data applications have not been widely used in the 2014, more and more industry users have tried to introduce large data-related technologies to solve the problem of how to manage and make use of the growing data, and security issues are becoming more and more concerned, in order to ensure the risk of preventing hackers from stealing data information, Companies should move to the advantage of making the most of large data, and it is necessary to take relevant security measures to protect the integrity of their data assets. Foreign well-known SSH communication security expert Matthew bring recently wrote an analysis of the current security problems, pointing out that ignoring machine-to-machine authentication risk is very scary, and these licensing mismanagement may lead to serious data leakage, Matthew bring also give some solutions, Finally, it calls for enhanced identity management to ensure large data security.
The following is the original text:
Big data is no longer a daydream. The various industry organizations are sifting through the network data to obtain the feasible conclusion with increasing speed. 90% of global data have been generated over the past two years, behind the data are insights into user behavior and market trends that may never be available through other channels, even as the White House has been involved, and recently invested 200 million of dollars in big data research projects.
As large data becomes easier to use, there is also greater concern about secure access to sensitive datasets and other areas of the network. If companies want to profit from large data without risking the risk of data leakage, these problems must be addressed effectively.
Ensure Machine-to-machine identity security
For large data analysis, large datasets need to be partitioned into a single, more manageable section, then processed separately through the Hadoop cluster, and finally regroup to produce the required analysis. The process is highly automated and involves a large number of machines (machine-to-machine) interacting across clusters.
Several levels of authorization will occur in the Hadoop infrastructure, including:
Accessing the Hadoop cluster
Inter-cluster communication
Cluster Access data source
These authorizations are often based on SSH (secure Shell) keys, which are ideal for using Hadoop because of their security level to support automated machine-to-machine communications.
Many popular cloud-based based Hadoop services also use SSH as a way to access the Hadoop cluster. Ensuring that identity in a large data environment is granted is a high priority, but it is also challenging. This is a big challenge for companies that want to use large data analysis like Hadoop. Some questions are straightforward:
Who is going to set up an authorization to run large data analysis?
What happens when a person who establishes a mandate leaves office?
Is the access level authorized to be based on the "precautions" security guidelines?
Who can access the authorization?
How do I manage these authorizations?
Large data is not the only technology that needs to be considered. As more and more business processes are automated, these issues will spread across the data center. Automated Machine-to-machine Transactions account for 80% of all communications in the data center, while most administrators focus on 20% of the traffic associated with the employee account. Large data will become the next killer application, and overall management of the machine-centered identity becomes imminent.
Risk
The well-known data leaks include the misuse of machine-oriented certificates, which reflects the real risk of ignoring machine-to-machine authentication. While the enterprise has made great strides in managing end-user identity, it ignores the need for machine-oriented authentication that should be handled with the same standards. The result is that the entire IT environment is covered by risk.
However, changing operating systems is a big challenge for trying to apply centralized identity and access management (as much as possible) to millions of machine-based identities. Without disrupting the system migration environment is a complex task, so it is not surprising that companies have been hesitant.
Bad condition of Key management
The status of key management has been bad. To manage the authentication keys used to secure Machine-to-machine communications, many system administrators use spreadsheets or custom scripts to control the allocation, monitoring, and inventory of keys. This approach misses out a lot of keys. They also did not have a regular scan, so unauthorized illegal ways are added to the list.
Lack of centralized control of keys severely impacts compliance. The financial industry, for example, requires strict control over who can access sensitive data, such as the recently tightened PCI standard requiring any place where payment cards are accepted-banks, retailers, restaurants and hospitals-to be executed in accordance with the same standards, without exception. As these industries are rapidly and decisively implementing big data strategies to share a slice of the tide of user-driven data, they are increasingly prone to violating regulations and facing regulatory sanctions.
Security steps
Organizations must acknowledge and respond to these risks. These steps are the best way to start the action:
Few IT personnel know where to store their identities, access rights, and the business processes they support. Therefore, the first step is the passive non-invasive discovery.
Environmental monitoring is necessary to determine which identities are active and which are not. Fortunately, in many businesses, unused-and therefore unwanted-identities tend to dominate. Once these unused identities are positioned and removed, the overall workload will be greatly reduced.
The next step is to centralize control of adding, changing, and deleting machine identities. As a result, policies can control how identities are used, ensure that no unmanaged identities are added, and provide effective proof of compliance.
As visibility and management control are identified, the necessary but policy-violating identities can be corrected without disrupting the business process. Centralized administration can modify the permission level for that identity.
Security Policy
The rise of large data is accompanied by a new risk of data access control. Machine-to-machine identity management is essential, but the traditional artificial IAM approach is inefficient and risky. Counting all keys and using the best approach saves time and money while improving security and compliance. As large data increases the threshold for access to sensitive information, organizations must take positive steps to introduce a comprehensive and consistent identity and access management strategy.