With the country's strategic thinking of promoting big data to promote economic and social transformation and development, the construction of big data platform is now the focus of government informationization construction, and provincial governments rely on a strong information system to take the lead. At the beginning of the big data platform business system construction, as the basis for the stable and continuous operation of the entire platform, the security construction plan will be an important part of the entire platform project.
The overall security construction of the big data platform, from data collection to data asset management, to platform access security management and data storage security, and copyright protection during data sharing and distribution, how the entire security solution forms a closed loop for data access and use And can realize the unified issuance and coordination of security policies, which is a thorny issue before the platform construction. This paper takes the security construction plan of a big data platform as a reference, and explores the effective security construction ideas. The plan has been initially approved by the construction party and has a foundation to be landed.
Safety construction complete thinking
I) Information resources to sort out the construction business needs: data combing
At the beginning of security construction, for the information resources that need to be protected, it is necessary to first check the situation:
1) Provide management and maintenance functions for the organizational structure, business roles, information resource categories, and information systems of the department;
2) Ability to manage business process diagrams and data flow diagrams, identify collaborative relationships and information sharing needs, and clarify responsibilities, mine, integrate data resources, and standardize data representation;
3) Ability to manage the database's theme library, logical entity, entity relationship diagram, data map, data element standard, and information classification code.
Technical realization: database scanning, data asset combing
Database Sweep: Implement automated security assessment of the database system, effectively expose the security issues of the current database system, and continuously monitor the security status of the database. Use database leak detection products to cover traditional database vulnerability detection items; implement high-end detection capabilities such as weak password scanning, sensitive data discovery, dangerous program scanning, and penetration testing; achieve high-efficiency and targeted through predefined security policies and customized security policies. Security status scan detection, and visualize the security status of the database system through various angles, various topics, and detailed reports.
Data asset grooming: realizing the "static + dynamic" combing of data assets
Static combing: automatic database sniffing: automatically search the database in the enterprise, can specify the IP segment and port range to search, automatically discover the basic information of the database; automatically data according to the characteristics of sensitive data or predefined sensitive data features Identify and continuously discover sensitive data; classify common sensitive data according to different data characteristics, and then specify different sensitivity levels for different data types.
Dynamic combing: combing the permissions of different users and different objects in the platform database system and monitoring the change of permissions. Monitor basic information such as user activation status, privilege division, and role attribution in the database; summarize and summarize user access, especially for sensitive objects, and focus on monitoring the division of access rights. The data flow is combed, monitoring and analyzing the operation of sensitive data by data applications, operation and maintenance tools, scripts, etc., forming a flow map of sensitive data, showing how sensitive data is processed and transferred; monitoring abnormal circulation Situation, timely detection of data violation use risk.
In summary, security risk scanning, detection, and asset grooming can clarify the export, import, and data relationships of information resources, identify business needs, data requirements, and integration requirements of various departments, and provide automated generation of documents for information resources (database Design documents, information resource catalogs, entity relationship diagrams, etc., and display information resources through mind mapping.
II) Government affairs, Internet data collection business needs: data sharing in the collection process
All government functional units gather information (public security, civil affairs, human society, etc.), Internet access public information collection (government websites, WeChat, social academic libraries, corporate information, etc.), and data needs to be shared, but sensitive data cannot be fully opened.
Technical realization 1: dynamic desensitization
The dynamic desensitization system is deployed in the data sharing, exchange, application, operation and maintenance area, and the database; an automated asymmetric data anonymization boundary is formed to prevent the private data from flowing out of the data area without desensitization.
Provides a strategy based on database access source IP, database application system, application system account, time and other factors. For sensitive data that needs to be shared, the dynamic desensitization strategy can be flexibly configured according to the sensitivity level of the data and the needs of the application, thereby achieving external Applications can use shared and sensitive data in a secure and controlled manner to prevent sensitive data from leaking. According to different data characteristics, built-in rich and efficient dynamic desensitization algorithm, including shielding, deformation, replacement, random; support custom desensitization algorithm, users can define as needed.
Technology Implementation 2: Data desensitization (static)
Data desensitization (static): Using static desensitization techniques for data, effectively preventing the abuse of private data within the big data platform and preventing private data from flowing out without desensitization. It not only satisfies the protection of private data, but also meets the data requirements of development, testing, model training, etc., while maintaining regulatory compliance and meeting corporate compliance.
III) Big data platform management (basic, service) business needs: unified resource management and control of big data platform
The data usage control of the big data platform needs to implement functions such as resource management, security management, operation and maintenance management, cluster deployment and monitoring, task scheduling, etc., and is equipped with a friendly management interface.
Technical realization: database auditing, database firewall, security operation and maintenance control
Database audit: Through the collection, analysis, filtering, analysis and storage of all network traffic accessing the database, comprehensively audit all the processing behavior of the database to meet the needs of the big data platform to monitor, collect and record data processing.
Database firewall: Deploy the database firewall between the application system and the database to protect against hackers attacking databases and stealing sensitive data due to WEB application vulnerabilities and application framework vulnerabilities; and ensuring the shared security of the core data assets of the big data platform.
Database security operation and maintenance system: fine-grained database operation and maintenance control function based on role management, accurate to SQL statements, ensuring the compliance of core data assets; providing operation authority, access control, and limiting NO WHERE updates for different database users And delete, to avoid large-scale data leakage and tampering; provide two-factor authentication and login control capabilities to prevent database account leakage and abuse; provide fine-grained management of user rights, strictly control the operation of sensitive data; control and audit actions, comprehensive and fine Audit analysis provides real-time access to statistical charts.
IV) Big Data Storage Security Hardening Business Requirements: Storage Security
For the data resources that fall to the big data platform, in addition to access management and control, it is necessary to add storage reinforcement means to the high-density data as the bottom line of data security protection.
Technical implementation: database encryption
Strengthen data security of big data platform, achieve overall data security hardening, and prevent data leakage. Enhanced encryption access and storage of sensitive data, encryption of key fields in sensitive data presentation. Anwar King's database encryption product DBCoffer can achieve table space level encryption for table space, encrypt all data in the table space, enhance data security; support table level encryption, enhance security and flexibility; While not affecting the permissions of the database itself, the system enhances the access control, and enhances the rights from different layers such as database users, client IPs, and application systems to prevent unauthorized access and prevent data leakage. The security service component implements the key. Management, let users master the key, even if the data is stolen, can not view the plaintext.
V) Big data operation and maintenance analysis business needs: support big data analysis operations
Through the analysis of big data from the operation and maintenance end, it can provide efficient analysis and calculation of massive data. The data analysis mining engine supports parallelized statistical algorithms and machine learning basic algorithm libraries, and supports parallelized basic algorithms that can process large data sets. Specific functional requirements include query, association analysis, statistical analysis, report display, data mining, and secondary development.
Technology implementation: data desensitization (static)
The analysis and calculation of massive data is a typical analysis scenario faced by the database desensitization system. In this scenario, some sensitive data in the production data is desensitized to effectively prevent the abuse of private data within the big data platform and prevent it. Privacy data flows out without desensitization. For the application scenario of data analysis, it supports desensitization of some data in the target database, and filters and filters the data sources according to the specified filtering conditions to form a subset of data. In the process of using desensitized products, in the face of frequent changes in data or data structures in the production environment, timely adjust the desensitization strategy to prevent sensitive data from “leakage” and trigger sensitive data leakage.
VI) Big data show business needs: public download and service of the government department
The government department can open all kinds of data downloads and services, provide data support for the social development and utilization of government information resources for enterprises and individuals, promote the development of information resources value-added service industry and the development of related data analysis and research work.
Technology realization: data desensitization (dynamic, static), data watermark
Data desensitization: Using dynamic + static data desensitization technology to achieve data security in the process of public downloading and development and utilization of external data.
Data watermarking: through the system outbound data behavior process management, data outsourcing for pre-data discovery, application approval, adding data tags, automatically generating watermarks, post-file encryption, external issuance for auditing, data source tracing, etc. To avoid the leakage of outgoing data, it is impossible to trace the incident, which improves the security and traceability of data transmission. The system assists the user in discovering sensitive data to complete the outgoing data combo through the intelligent automatic discovery function; the watermark is processed by adding pseudo-rows, pseudo-columns, desensitizing the original sensitive data and embedding the mark to ensure the normal use of the distributed data. The watermark data has high availability, high transparency and no concealment, and is not easily cracked by external discovery. Once the information is leaked, the watermark identification is extracted from the leaked data for the first time. By reading the watermark identification, the data flow process is traced, the leaking unit and the responsible person are accurately located, and the data traceability is pursued.
The security construction work of the entire big data platform is not a simple security product stacking. It needs to build a complete data security protection system based on professional security construction ideas to meet the business needs while taking into account security requirements.