The IT world is rapidly embracing "big data". Large data storage will be the next topic of discussion for Big data analysis, as big data is getting bigger, for example, and startups are using these systems to analyze the history of human evolution with trillions of DNA tests. Although large data (and its underlying technology NoSQL) is becoming a popular language in information systems, there is not much discussion about the security implications of large data.
Large Data Overview
NoSQL refers to a relational database, which contains a large number of different types of structured and unstructured data storage. Because of the diversity of data, these data stores are not accessed through standard SQL voice. Previously, we often divided data storage into two types: relational databases (RDBMS) and file servers. and NoSQL open our horizons, unlike the traditional concept of relationship, NoSQL does not follow the structured form. The main advantage of this NoSQL data storage approach is the scalability and availability of data and the flexibility of data storage. Each data store mirrors stored in different locations to ensure data availability and no data loss, and such storage systems are typically used for trend analysis, but these systems do not apply to financial transactions that require real-time updates, which financial institutions can use to analyze the most efficient or busiest branch offices.
NoSQL equals no security?
Many people may say that developers of different nosql systems will be targeted to eliminate security factors from their systems. For example, Cassandra only basic built-in identity authentication programs, their idea is that the database administrator does not need to worry about security issues, security issues should be referred to a dedicated team to deal with. In our view, NoSQL poses the following security challenges:
★ Pattern Maturity. The current standard SQL technology includes strict access control and privacy management tools, which are not required in NoSQL mode. In fact, NoSQL cannot follow the SQL model, it should have its own new model. For example, the security of columns and row-level is more important in NoSQL data stores than traditional SQL data stores. In addition, NoSQL allows you to continually add attributes to data records, so proactive security becomes important and businesses need to define security for these future attributes.
★ Software Maturity. Over the years, the database and file server systems have become more mature in the aftermath of various security problems. While NoSQL can take some lessons from these systems and reduce the complexity of NOSQL data storage, we believe NoSQL will still be vulnerable in at least five years, after all, it uses new code.
★ Employee Maturity. Even the most experienced database administrator is a novice to NoSQL. This means that these people should first look at how to make it work (which is hard enough), and perhaps have time later to consider security issues. When that happens, they're sure to make a lot of integration mistakes.
★ Client software. Because the NoSQL server software is not built in enough security, you must build security on the applications that access the software, which in turn can lead to a number of security issues:
☆ Increase the authentication and authorization process to the application. This requires more security considerations, which only make the application more complex. For example, your application will need to define users and roles. Based on this type of data, the application can decide whether to authorize access to the system to the user.
☆ Input Validation. Again, we see problems plaguing relational database applications that continue to plague the NoSQL database. At last year's Black Hat conference, for example, researchers showed how hackers use "NoSQL injection" to access restricted information. Although the 2012 Black Hat convention time has not been confirmed, we are looking forward to seeing more about NoSQL this year.
☆ Application Awareness. In situations where each application needs to be managed securely, the application must be aware of all other applications. This can prevent access to all non-application data.
☆ When new data types are added to the data store, the data store administrator must figure out which applications cannot access specific data.
☆ Vulnerability-prone code. There are many NoSQL products on the market, but more applications and application server products. The more applications there are, the more vulnerable code is generated.
★ Data redundancy and dispersion. relational database security basics talking about data normalization---storing a piece of data in a single location. But large data systems have completely changed the pattern. The inherent pattern of these systems is to replicate data to many tables to optimize query processing. Data is dispersed across different data warehouses in different geographic locations, and it is difficult for businesses to locate and protect all confidential information.
★ Privacy issues. Privacy issues are not driven by security issues, but because of the use of large data: the association of data from different activities of different applications from different systems. Take Google. For example, they changed their privacy policy a few months ago, and the new terms allow Google to fuse information from all services. As individuals, this seriously affects our ability to evade corporate tracking, even if we use multiple identities. But these companies now face risks. On the one hand, they are trying to keep the data in their business, mainly because of ownership and regulatory needs. Recently, however, scientists have begun to express concern about this practice, requiring companies to disclose the data sets to validate their findings.
Summary
NoSQL is still in its infancy and we may not be able to see any nosql security solutions for the next year or so. For their own development NoSQL solution enterprises, they should first carefully select their development team, the team should include a security concept of industry veterans. In addition, code reviews should be done to ensure the security of the software.
Finally, intensive input validation and network isolation should be used to minimize the exposure of the platform to the user. It's a good thing we're in the big Data age now, the cost of storage is down, and technology allows us to easily access and analyze data.
(Responsible editor: The good of the Legacy)