In fact, McKinsey Global Research predicts that in the next six years, only in the United States may face a shortage of 0.14 million to 0.19 million people with in-depth data analysis capabilities, at the same time, there is a gap of 1.5 million managers and analysts who analyze big data and make effective decisions for enterprises.
David Menninger, an analyst at Ventana research, pointed out that a recent survey conducted by his company showed that, 169 of the 3/4 executives believe that the lack of technical staff is an important factor in the company's inability to cope with the challenges of big data.
In addition to the core design concepts of hadoop mapreduce and HDFS (hadoop Distributed File System), hadoop also includes the SQL-like Query Language hql, to nosql hbase databases (nosql databases are usually used to process unstructured data, including audio and video .), And the Machine Learning Library mahout. Cloudera, hortonworks, and mapr have all joined the hadoop project in their distributed system.
The mapreduce programming model can be considered as the soul of cloud computing technology. Mapreduce is a programming model that processes large and super-large datasets and generates related execution. Its main idea is to use functional programming languages for reference and include features borrowed from Vector programming languages.
Beth Stackpole, a Special Editor of techtarget, points out that today's teams that manage traditional structured data environments are indeed quite professional, but they are unable to cope with open-source big data technologies such as hadoop and mapreduce. The reason for this is that the skills for coping with traditional relational databases cannot be converted into the skills for coping with massive unstructured data in the big data world. Nosql database technology is built based on the core of the new platform.
Hot occupation in the big data age
Big Data Processing System Administrator
The Administrator of the big data processing system is responsible for the normal operation of hadoop clusters on a daily basis. For example, to directly or indirectly manage hardware, you must ensure that the cluster can still run stably when you need to add hardware. At the same time, it is also responsible for system monitoring and configuration to ensure the organic combination of hadoop and other systems.
Big Data Processing Platform developers
Big Data Processing Platform developers are responsible for building big data processing platforms and data analysis applications. As it has relevant experience in the development field, it is familiar with related tools or algorithms. This will be helpful in writing, optimizing, and deploying various complex mapreduce tasks. The role of practitioners using big data-related technologies is similar to that of DBAs in the traditional database world.
Data analysis and data scientists
Data analysis and data scientists belong to the same category of work. Those who have professional knowledge in the field study the problems related to algorithm analysis, and data mining is also an important technology they should master. It helps create big data products and big data solutions that promote business development.
Data Manager
To improve data quality, enterprises must consider the appointment of Data Manager. Data Manager needs to use hadoop to collect a large amount of data around the Enterprise, clean and standardize the data through the ETL process, and enter the data warehouse to become an available version. Then, data is sliced, sliced, and delivered to thousands of people through report and analysis technologies. Data Manager ensures the integrity, accuracy, uniqueness, authenticity and non-redundancy of market data.
Although we are facing a shortage of technicians today, we are not desperate. Cloudera's Omer trajman pointed out that hadoop as a big data technology solution is not as difficult as learning how to create a rocket. Few people know hadoop a few years ago, but more and more people are learning hadoop. Enterprises should encourage and train technicians to learn hadoop technology. (Li Zhi/compilation)
Http://cloud.csdn.net/a/20120115/310794.html