Large data architectures and platforms are new things and are evolving at an extraordinary rate. The business and open source development team publishes new features of its platform almost every month. Today's large data clusters will be very different from the data clusters we see in the future. Security tools to adapt to this new difficulty will also change. In the life cycle of large data, the industry is still at an early stage, but the sooner companies start coping with the security of large data, the easier it is to work. If security becomes an important requirement in the development of large data clusters, the cluster is not easily corrupted by hackers. In addition, companies can avoid putting immature security features in critical production environments.
The word "big data" is often misunderstood. In fact, it makes little sense to use too much frequency. Large data does store and process a large number of data sets, but its characteristics are much more than that.
When it comes to solving large data problems, it is useful to view them as an idea rather than a specific scale or technology. In its simplest form, the big data phenomenon is driven by the intersection of three major trends: a large amount of data containing valuable information, cheap computing resources, and almost free analysis tools.
Today, there are many large data management systems that pay special attention to different data types, such as geographic data. These systems use a variety of different query patterns, different data storage patterns, different task management and coordination, and different resource management tools. Although large data is often described as "inverse relational", the concept does not capture the nature of large data. To avoid performance problems, large data did discard the core functions of many relational databases, but did not make any mistakes: some large data environments provide relational structures, business continuity, and structured query processing.
Since traditional definitions fail to capture the nature of large data, we might consider large data based on key elements that make up a large data environment. These key elements use many distributed data storage and management nodes. These elements store multiple copies of data and make the data "fragmented" across multiple nodes. This means that when a single node fails, the data query shifts to processing the data available to the resource. It is this kind of distributed data node cluster that can cooperate with each other to solve the problem of data management and data query, which makes the big data so different.
The diagram above shows an architecture diagram of a Hadoop file system showing how data nodes and clients interact.
The loose connections of nodes bring many performance advantages, but they also pose unique security challenges. Large data databases do not use a centralized "walled garden" model (as opposed to a "fully open" Internet, which refers to an environment that controls user access to Web content or related services), and internal databases do not hide themselves and make other applications inaccessible. There is no "internal" concept, and large data does not depend on the point of concentration of data access. Large data exposes its schema to applications that use it, while clients communicate with many different nodes during the operation.
Scale, real-time, and distributed processing: The essential characteristics of large data (which enable large data to be resolved beyond the data management and processing requirements of previous data management systems, such as capacity, real-time, distributed architecture, and parallel processing) make it more difficult to secure these systems. Large data clusters are open and self-organization, and enable users to communicate with multiple data nodes simultaneously. It is difficult to verify which data nodes and which customers should access the information. Don't forget, the nature of large data means that new nodes are automatically connected to the cluster, sharing data and query results, and solving customer tasks.
Embedded security: In crazy contests involving big data, most development resources are used to improve the scalability, ease of use, and analysis of large data. Only a few features are used to add security features. However, you want to get the security features embedded in the large data platform. You want developers to be able to support the functionality they need during the design and deployment phases. You want security features to be scalable, high-performance, and self-organizing like large data clusters. The problem is that open source systems or most business systems generally do not include security products. And many security products cannot be embedded in Hadoop or other non relational databases. Most systems provide minimal security features, but not enough to cover all common threats. To a large extent, you need to build your own security policy.
Applications: Most applications that target large data clusters are Web applications. They take advantage of web-based technology and stateless, rest-based APIs. While it is beyond the scope of this article to fully discuss the issue of large data security, web-based applications and APIs pose one of the most significant threats to these large data clusters. They can provide unrestricted access to data stored in large data clusters after they are attacked or compromised. Application security, user access management, and authorization control are essential, as are the security measures that focus on securing large data clusters.
Data security: Data stored in large data clusters is basically stored in files. Each client application can maintain its own design of containing data, but this data is stored on a large number of nodes. The data stored in the cluster is vulnerable to all the threats that are susceptible to normal files, and the files need to be protected from illegal viewing and copying.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.