Remember 11 in Baidu know search Hadoop related problems only a few sporadic, that will I basically every day to see if I can answer the question. Now go to Baidu know search Hadoop already have 800多万个 problem. Today, I would like to talk about the current work on Hadoop, hoping to help beginners now.
What is Hadoop? Hadoop is a storage System + computing Framework! It mainly solves the problem of storing and computing massive data. Eric Baldeschwieler, chief technology officer at Hortonworks, mentioned at the 2012 Hadoop and Big Data summit: In 2015, half the world's data will be handled through Hadoop. We've seen more and more data migrating toward Hadoop.
Hadoop-related work can now be broadly divided into three categories:
1.Hadoop Applications:
The main task is to write mapreduce, pig, hive and other scripts, data analysis or data mining, Hadoop is only a tool, to achieve business is still the main goal. You must know at least one programming language, such as Java, Python, and so on. Most of the Hadoop books and training institutions are now being developed in this area. You just have to have a basic understanding of the framework of Hadoop, understand the MapReduce programming pattern, and master some tuning techniques, so you have mastered the tool. Start with the WordCount! Recommended Introductory Book <
2.Hadoop operation and Maintenance:
Mainly responsible for cluster construction, various configuration parameters tuning, general fault handling, responsible for the stable operation of the whole cluster, this part of the talent is relatively scarce, is also a very need of the employing units. In this part of the work of the students in theory he can not understand Java, the character of the requirements of more rigorous work. If you are more interested in studying, you can get some tuning solutions by yourself, and of course you can get some solutions by communicating with the industry Daniel. None of this matters, and it is important that you find a solution quickly when you encounter problems. This part of the work experience is built up, and your operation of the cluster size is also related. If you have the opportunity to do this work in large companies, you will grow quickly. If you want to get through the conclusion of a few posts, it is easy to be seen. Recommended Books <<pro hadoop>>, Hadoop website.
3.Hadoop Frame Retrofit:
Not all enterprises have established such a post, the main task is to patch the Hadoop framework itself, change bugs, research new features, planning version upgrades. This requires you to go deep into the Hadoop source code, always focus on the Hadoop website, understand the latest version of the features, grasp the future of Hadoop development direction. Recommended Books <
Finally, we recommend the Hadoop Learning Roadmap and Hadoop-related training, theHadoop Big Data Best Practices workshop , which will familiarize you with the Hadoop distributed file system; understand how MapReduce works Familiar with Hadoop cluster hardware configuration planning; understand Hadoop cluster configuration and optimization; Learn how to maintain and monitor hadoop clusters; Learn how to use Sqoop to connect relational databases for data import and export; Understand the development and application of hive Data Warehouse ; Hue's Web page database development; Proficient in the development of hbase database; grasp the basic of big data mining analysis; Understand several common tools and comparisons of data mining; Learn about the principles of several commonly used mining algorithms; Learn about the big company's application scenarios and future directions for big data.
These three kinds of work can be completely independent, the human energy is limited, each person's personality interest is not the same, make clear what you like to do, then targeted to learn.
I wish you all can find the position that you like!
Tell you why you want to learn Hadoop?