Poptest is the only training institute for developing Test and development engineers in China, aiming at the ability of the trainees to be competent in automated testing, performance testing and testing tools development. If you are interested in the course, please consult qq:908821478, call 010-84505200.
Start with a simple look at the concepts of cloud computing and big data.
1) Cloud computing: cloud computing is essentially a utility computing model in which computing resources are centrally distributed and fully shared , with the focus on the intensive management of computing resources, where distribution is easy to expand computing power. Centralized distributed for cloud service providers, Full sharing is for users, and in the cloud, although for each cloud user has a supercomputer, in essence, these users are fully sharing the computing services provided by the cloud service providers. Utility computing is more of a business model where users pay for the services they need.
2) in the previous blog post, there is a discussion of big data, in short, big data is characterized by a large amount of data (although many people have the big data defined above the T level, in fact, I think this is problematic, big data in fact should be a relative concept, is relative to the current storage technology and computing power ), the data application demand is Big , the computation quantity is big. Large amount of data is the most basic, the demand is in fact contains the quantity of demand, Diversity and real-time. The computational complexity is due to the large amount of data and the large demand and complex algorithm (retrieval, recommendation, pattern recognition). This characteristic of big data makes it difficult to find a common processing mode to solve the problem of big data, we can only use different processing methods for different needs, which is also the crux of the difficulty in big data processing. Whether it's a traditional database or a recent NoSQL database, there are very big limitations in big data storage and processing, so distributed computing is a big part of the process. While Hadoop provides a more complete set of processing patterns, the problem domains that can be handled are very limited relative to the diversity of application requirements that big data faces.
Database and the concept of data Warehouse, you can google a bit, and then we look at the relationship between them:
1) database and Data Warehouse is a kind of storage method of data, big data processing is more a kind of demand (problem), and cloud computing is a more comprehensive demand (problem) solution.
2) due to the nature of cloud computing itself, the problem of large data processing (storage, computation, etc.) is inherently faced because the basic architecture pattern of cloud computing is C/S mode, where s is relatively concentrated and C is widely distributed. All of the user's data and most of the calculations are done on the S-side (large data volume , Computational capacity ), plus the user also has a natural diversity (geographical, cultural, demand, personalization, etc.), so the demand (also including computational capacity) is very large.
3) cloud computing, of course, involves data storage technology, but database technology for cloud computing depends on the specific situation to analyze:
A) in the case of IaaS, database technology is not required, nor is it a necessary function;
B) for PAAs, database functionality should be a must-have feature
C) for SaaS, database technology (including traditional relational databases and NoSQL databases) is bound to be used.
And for data warehousing technology, is not necessary for cloud computing, but because the information of cloud data is very valuable, like a gold mine, I think the cloud service providers can not let go of gold from these gold extraction.
4) Big Data first face the problem is big data storage problem, general will use a variety of storage technology (file storage, database storage), of course, you completely use file storage or database storage to solve, also no problem. Similar to cloud computing, data warehousing technology is not required, but it is very useful for data warehousing techniques to make gold panning for structured data, and of course, you do not have data warehousing technology, such as Hadoop mode.
In cloud computing and big data processing, the most basic technology is actually distributed computing. For the construction of distributed computing, multi-threading, synchronous, Remote call (Rpc,rmi, etc.), process management and communication is its basic technical point. Distributed computing programming is a kind of comprehensive application programming, not only need to have the basic technical point, but also need certain organization management knowledge.
For the time being, cloud computing and Big data processing do not actually form a unified standard and definition, the above is just my work and learning process, the understanding of these things, but also want to discuss this kind of problem with everyone. Of course, I also hope that my reply will be helpful to you.
Lao Li share: What is the relationship between big data, databases, and data warehouses