Some netizens asked what is the relationship between cloud computing, big data, databases, and Data Warehouses. Here I will briefly explain my understanding:
First, let's take a brief look at the concepts of cloud computing and big data.
1) cloud computing: cloud computing is essentially a computing resource.Centralized distributionAndFull sharingOfUtility CalculationMode, which is centralized for intensive management of computing resources, distribution is convenient to expand computing capabilities. centralized distribution is for cloud service providers. Full sharing is for users. In cloud computing, although each cloud user has a supercomputer, in essence, these users fully share the computing services provided by cloud service providers. utility computing is more of a business model, that is, users pay for the services they need.
2) in the previous blog post, I had a discussion about big data. Simply put, big data is characterizedLarge data volume(Although many people define big data above the T level, I think this is a problem. Big data is actually a relative concept, is relative to the current storage technology and computing capabilities), data applicationsHigh demands,Large computing capacity.A large amount of data is the most basic. A large demand actually includes the quantity, diversity, and real-time requirements. A large amount of computing is due to a large amount of data and a large amount of complicated algorithms (search, recommendation, and pattern recognition. this feature of big data makes it difficult for us to find a common processing mode to solve the problems faced by big data. We can only adopt different processing methods for different needs, this is also the crux of big data processing difficulties. Whether it is a traditional database or a nosql database that has recently emerged, it is actually very limited in terms of big data storage and processing. Therefore, distributed computing is just emerging in big data processing. Although hadoop provides a complete set of processing modes, compared with the diversity of application requirements faced by big data, the problem domains that can be processed are also very limited.
The concept of database and data warehouse can be searched by Google. Next, let's look at the relationship between them:
1) databases and data warehouses are both data storage methods. Big Data Processing is more of a requirement (problem), while cloud computing is a comprehensive requirement (problem) solution.
2) due to the characteristics of cloud computing, it is born to face big data processing (storage, computing, etc.) problems, because the basic architecture of cloud computing is the C/S model, where S is relatively concentrated, C is widely distributed. All user data and the vast majority of computing are completed on the S end (Large data volume,Large computing volume), And users are naturally diverse (region, culture, needs, personalization, etc.), soRequirement(Including the amount of computing) is very large.
3) cloud computing will certainly involve data storage technology, but database technology should be analyzed based on the specific situation for cloud computing:
A) For IAAs, database technology is neither necessary nor necessary;
B) For paas, database functions should be essential.
C) for SaaS, database technology (including traditional relational databases and nosql databases) is required ).
The data warehouse technology is not necessary for cloud computing, but because of the great information value of cloud data, similar to a gold mine, I think it is impossible for cloud service providers to extract gold from these gold mines.
4) The first problem facing big data is the storage of big data. Generally, various storage technologies (file storage and database storage) are used comprehensively. Of course, it's okay if you use file storage or database storage. Similar to cloud computing, the data warehouse technology is not necessary, but it is very useful for structured data mining. Of course, you can also use the data warehouse technology, such as the hadoop mode.
In cloud computing and big data processing, the most basic technology is actually distributed computing technology. For building distributed computing, process management and communication are the basic technical points of multithreading, synchronization, remote calling (RPC, RMI, etc. Distributed computing programming is a kind of integrated application programming. It requires not only basic technical points, but also organization and management knowledge.
At present, cloud computing and big data processing have not yet formed a unified standard and definition. The above is just my understanding of these things during my work and study, we also hope to discuss such issues with you. Of course, I also hope my reply will help you.