From: http://cloud.csdn.net/a/20111117/307657.html
One of the reasons for the success of the mapreduce system is that it provides a simple programming mode for writing code that requires large-scale parallel processing. It is inspired by the functional programming features of Lisp and other functional languages. Mapreduce works well with cloud computing. The key feature of mapreduce is that it can hide the parallel semantics of operations for developers-specific ways of working in parallel programming.
However, today, mapreduce is hard to become a way for business people to discuss big data. Because mapreduce requires at least four skills.
1. Convert business problems into problems that can be analyzed and solved
2. Convert the problem that can be analyzed and solved into a mapreduce Model
3. debugging, coding, and optimizing mapreduce to process data
4. rich experience in hadoop and mapreduce, and the ability to debug and deploy code on hadoop
In the big data era, traditional databases are insufficient to query, sort, define, and extract data. The essence of processing big data services (such as mapreduce) requires more skills. However, it is unrealistic to hire large numbers of these highly skilled talents.
Integration of traditional and modern SQL and mapreduce
SQL is a very familiar model for programming experts and business analysts to query data. The charm of mapreduce lies in its ability to process complicated search queries in program solutions. What changes will happen if we combine the two?
Aster has provided a framework called SQL-mapreduce, which enables data scientists and business analysts to quickly investigate and analyze complex information, allows a group of computers (Computer Clusters) to concurrently express programs in software languages (such as Java, C #, Python, C ++, and R), and then activate (CALL) through standard SQL).
Greenplum supports SQL and mapreduce parallel processing, and can process terabytes to petabytes of enterprise data at a low cost. Greenplum integrates mapreduce and SQL technologies and runs mapreduce and SQL directly in greenplum's parallel data stream engine (in the center of greenplum data engine. Greenplum mapreduce allows programmers to analyze Pb-scale datasets stored inside and outside the greenplum Data Engine. The advantage is that the increasing standard programming model can meet the reliability and familiarity of relational databases.
At the same time, leading vendors such as Microsoft are also involved. Microsoft has released a connection tool between hadoop and SQL server. Customers can exchange data in hadoop, SQL Server, and parallel data warehouse environments. At the same time, Microsoft also has in-depth cooperation with hortonworks. The purpose is to combine the expertise of hortonworks in the hadoop field and the ease-of-use features of Microsoft products, and to simplify the download, installation, and configuration of several hadoop-related technologies.
In the future, with the continuous improvement of the combination of SQL and mapreduce technologies, mapreduce will become more easy to use and will be widely concerned. Believe me, time will prove everything.