1. Introduction
The generation of large data makes data analysis and application more complicated and difficult to manage. According to statistics, the amount of data produced in the world over the past 3 years is more than the previous 400 years, including documents, pictures, videos, web pages, e-mail, microblogging and other types, of which only 20% are structured data, 80% is unstructured data. The increase of data makes the data security and privacy protection problem become more and more prominent, all kinds of security incidents to the enterprise and user wake-up alarm. Throughout the data lifecycle, enterprises need to comply with stricter security standards and confidentiality requirements, the security and privacy of data storage and use of increasingly high demand, traditional data protection methods often fail to meet the new change network and digital Life also makes hackers more easily access to other people's information, with more difficult to be traced and prevent the criminal means , and existing laws and regulations and technical means are difficult to solve such problems. Therefore, data security and privacy protection are a major challenge in a large data environment.
But it should also be seen that in the large data age, the combination of business data and security requirements can effectively improve the level of security protection of enterprises. Through a lot of collection, filtering and integration of business data, after meticulous business Analysis and association rules mining, enterprises can perceive their own network security situation and forecast the trend of business data. Understanding the operational security of the business, which is of revolutionary significance to the enterprise. At present, in some operators of the business unit has begun to use security baselines and large data analysis technology, the timely detection and discovery of various abnormal behavior and security threats in the network, thus taking appropriate security measures. According to Gartner, 40% of companies in 2016 (mainly in banking, insurance, pharmaceuticals, telecoms, finance and defense Industries) will actively analyze at least 10TB of data to identify potential security risks.
With extensive attention to large data. The research and practice of large data security has been gradually launched, including scientific research institutions, government organizations, enterprises and institutions, security vendors, and other forces, are actively promoting with large data security-related standards and product development, for large-scale application of large data to lay a more secure and solid foundation.
2. Large data security requirements in different areas
Before understanding the connotation of large data security and formulating corresponding strategies, it is necessary to understand and master the security requirements of large data in various fields in order to analyze the security features and problems in large data environment.
(1) Internet industry
Internet enterprises in the application of large data, often involves data security and user privacy issues. With the development of E-commerce and mobile internet behavior, internet companies are more vulnerable than before. The aim of the attack is not only to let the server down, but also to penetrate APT's attack way. Therefore, the task of preventing data from being corrupted, tampered with, compromised, or stolen is daunting. At the same time, because of user privacy and trade secrets involved in a wide range of technical areas, the mechanism is complex. It is difficult to have experts through the legal theory and professional technology, the definition of personal privacy and trade secrets of the spread of loss, it is also difficult to define whether the subject of infringement for personal or corporate behavior. Therefore, the large data security requirements of Internet enterprises are: reliable data storage. Security mining analysis, strict operational supervision, call for privacy protection standards, laws and regulations, industry norms, and look forward to the reasonable discovery and exploration of business opportunities and business value from massive data.
(2) Telecommunications industry
The production, storage and analysis of large amount of data make the operators face a series of problems such as data confidentiality, user privacy and business cooperation in the process of data external application and opening. Operators need to use enterprise platforms, systems, and tools to implement scientific data modeling to determine or classify the value of these data. Because the data is often scattered in many systems, the information source is very complex, so operators need to carry out effective data collection and analysis to ensure data integrity and security. In foreign cooperation, operators need to be able to accurately transform external business requirements into actual data requirements, and establish a sound data access mechanism for opening to the outside world. In this process, how to protect the user's privacy effectively, prevent the enterprise core data leakage, become the operator to carry out large data application needs to consider important issues. Therefore, the large data security requirements of telecom operators are: To ensure the confidentiality, integrity and availability of core data and resources. Give full play to the value of data on the basis of safeguarding the interests, experience and privacy of users.
(3) Financial industry
The system of financial industry has the characteristics of mutual implicate, diverse use of objects, multiple security risks, information reliability and high confidentiality requirements. and the financial industry on the network security, stability requirements higher. The system should be able to process data at high speed, provide redundant backup and fault-tolerant function, have good management ability and flexibility to deal with complex applications. Although the financial industry has been in the data security of additional investment and technology research and development, but due to the extension of the financial sector business chain, the popularity of cloud computing model, the improvement of their own system complexity and improper use of data. Have increased the security risks of big financial data. Therefore, the financial industry's large data security needs are: The data access control, processing algorithms, network security, data management and application of security requirements, expect to use large data security technology to strengthen the internal control of financial institutions, improve financial supervision and service levels, to prevent and defuse financial risks.
(4) Medical profession
With the increase of the geometric multiple of medical data, the pressure of data storage becomes more and more great. The safety and reliability of data storage has been a matter of continuity in the hospital business. Because once the system fails, the first test is the data storage, disaster preparedness and recovery capabilities. If the data cannot be recovered quickly and the breakpoint is not restored, the hospital's business and patient satisfaction are directly impaired. At the same time, medical data is very privacy, most medical data owners are unwilling to provide data directly to other units or individuals for research and utilization, and the limited data processing technology and means has also caused the waste of valuable resources. The need for large data security in the healthcare industry is therefore: data privacy is higher than security and confidentiality, and requires safe and reliable data storage, perfect data backup and management to help doctors and patients to diagnose disease, drug development, management decision-making, improve hospital services, improve patient satisfaction and reduce patient turnover rate.
(5) Government organizations
The potential for large data analysis in security has been found by national government organizations, and its role is to help countries build a more secure network environment. For example, the U.S. Import Safety Reporting Commission recently announced that through 6 critical survey results, large data analysis not only has strong data analysis capabilities, but also ensures data security. The U.S. Department of Defense has been actively deploying large data operations, using massive amounts of data to exploit high-value information, improve rapid response capabilities, and make decision automation. The United States Central Intelligence Agency (CIA) enhances national security by leveraging large data technology to improve the ability to extract knowledge and ideas from large and complex digital data sets. Therefore, the government organization's demand for large data security is: the security supervision of privacy protection, the safety of network environment, the development of large data security standards, and the regulation of safety management mechanism.
3. Large Data Environment security
The above analysis shows that the security requirements in various fields are changing, from data collection, data integration, data extraction, data mining, security analysis, security situation judgment, security detection to discover the threat, has formed a new complete chain. In this chain, data may be lost, leaked, unauthorized access, tampered with, and even involve user privacy and corporate secrets. Generally, large data security has the following 6 aspects of the characteristics and problems.
(1) Mobile data security is facing high pressure
The rise of new applications such as social media, e-commerce and Internet of things. Broken the original value chain of the wall, only the original value chain of the data analysis of each link, can not meet the demand. The need for a large data strategy to break the data boundaries to enable enterprises to understand a more comprehensive operational and operational environment panorama. However, this clearly will be the enterprise's mobile data security to prevent the ability to put forward higher requirements. Addition。 The increase in data value will result in more sensitivity analysis data transfer between mobile devices, some malware even have a certain data upload and monitoring functions, can trace the user location, steal data or confidential information, serious threat to personal information security, so that the level of security accidents. How to track mobile malware samples and their starting pupae in a situation where mobile devices and mobile platforms threaten to grow rapidly. The analysis of the relationship between the samples has become a problem to be solved in the mobile large data security.
(2) Networked society makes large data easy to be targeted
In cyberspace, large data is a big target that is more easily discovered. On the one hand, network access convenience and the formation of data flow, in order to realize the rapid and flexible resources to push and personalized services to provide the basis. Because of the exposure of the platform, large data with potential value is more likely to attract hacker attacks. On the other hand, in an open networked society, large data data are large and interrelated, making it possible for hackers to gain more data at once, which can reduce the attacking cost of hackers. Increase the yield. For example, hackers can use large data to launch botnet attacks, control millions of puppet machines and launch attacks, or use large data technologies to maximize the collection of useful information.
(3) User Privacy protection becomes a problem
The pooling of large data inevitably increases the risk of disclosure of user privacy data. Because the data contains a large number of user information, so that the exploitation of large data can easily violate the privacy of citizens, malicious use of the technology threshold of citizen privacy greatly reduced. Under the environment of large data application, the traditional data privacy protection technology based on static data set is challenged because of the dynamic characteristic of the data and the random change of the attribute and the expression form in the database. There are many requirements and characteristics of user privacy protection in various fields, and there are complex correlations and sensitivities between impressive, and most of the existing privacy protection models and algorithms are only aimed at traditional relational data and cannot be transplanted directly into large data applications.
(4) Secure storage of massive data
With the continuous growth of structured and unstructured data and the diversification of data sources. Previous storage systems have been unable to meet the needs of large data applications. For unstructured data that accounts for more than 80% of the total data, NoSQL storage technology is used to capture, manage and process large data. Although the NoSQL data storage is easy to expand, high availability and good performance, there are still some problems. For example, access control and privacy management mode issues, technical vulnerabilities and maturity issues, authorization and authentication security issues, data management and confidentiality issues. and the security protection of structured data also has loopholes, such as physical failure, artificial misoperation, software problems, viruses, trojans and hacker attacks, and other factors can seriously threaten the security of data. The storage-capacity problem, latency, concurrency access, security issues, cost problems brought by large data have challenged the storage system architecture and security protection of large data.
(5) Large data lifecycle changes promote data security evolution
Traditional data security is often deployed around the data lifecycle, that is, the generation, storage, use, and destruction of data. As large data applications are increasingly used, data owners and managers are separated, and the original data lifecycle is gradually transformed into data generation, transmission, storage, and use. Because the size of large data has no upper limit, and the life cycle of many data is extremely short, therefore, the traditional security products want to continue to play a role, we need to solve the dynamic and parallel characteristics of large data storage and processing, dynamic tracking data boundaries, management of data operation behavior.
(6) Trust security of large data
The biggest hurdle to big data is not how successful it is, but what makes people really believe in big data and trust big data, including trust in other people's data and the use of self data. For example, wages have been "increased" in recent years, CPI has been "dropped", house prices have been "lowered", and unemployment has been "reduced" because of the differences between people's personal feelings and statistics and the serious discrepancy between the GDP figures of countries and places. Have caused the market to question the statistics. At the same time, the trust security problem of large data is not only to believe in the large data itself, but also to believe that the results can be obtained through data. However, it is not easy to believe and trust the insights gained through large data models, and proving the value of large data itself is more difficult than completing a project successfully. Therefore, it is very important to build security trust for large data, which requires government agencies, enterprises, individuals and so on to jointly build and maintain a safe environment with large data to be trusted.
4. Large data security implications
Ensure large data security, that is, the security of large data itself
Large data security differs from relational data security, and large data is significantly different from relational data in data volume, structure type, processing speed, value density and data storage, query mode and analysis application. Large data means the distribution of data and its carrying system, the value of individual data and system is reduced, the space and time span, and the value is sparse, which makes it more difficult for external personnel to find value attack points. However, it is very difficult to be completely centralized in a large data environment. As long as the presence of the center can become the acupuncture of the attack, the process of refining the low density value is also the content of the attraction attack. For these issues, the monitoring, analysis log files, data discovery, and vulnerability assessment techniques used by traditional security products do not work effectively in large data environments. In many traditional security technology scenarios, the size of the data will affect the security control or the proper operation of the supporting operations. Most security products cannot be adjusted to meet large data areas or fully understand the information they face. Moreover, in the large data age there will be more and more data open, cross use, in the process of how to protect user privacy is the most need to consider the issue.
In order to solve the security problem of large data, it is necessary to redesign and construct large data security architecture and open data service to deploy the whole security solution from the aspects of network security, data security, disaster backup, security risk management, security operation Management, security Incident management, security management, etc. Guarantee the security of large data calculation process, data form and application value.