PS: The following article will be my practice of the content decomposition into a small module, convenient for everyone to learn, exchange. I will also attach the relevant code. Come together! There are three years of big data principles that have never been practiced. Recently prepared to leave, just the big data you learn the content of all practice, not only pure theory. The face of practice, the first to have an empty cup of mind, empty their own after, can learn more, refueling! Also hope that we pay more attention, will pay more attention to practice and the principle of integration.
Environment construction
For big data, the focus is on the underlying architecture of Hadoop. Even though the spark architecture now uses more. But Hadoop is still the foundation. There is also why Linux-based, mainly because the majority of the current Web site server is in the Linux system. This, personal feeling is still a lot of advantages in file reading and data invocation. Linux's open source makes it easier for most programmers to understand the system. Linux is also a great help for Python programming. As for the popular story: Life is short, you have to use Python. Personal understanding, programming with Python is very simple. And as long as it is equipped with Linux system can.
For the environment to build this part, during the operation. The biggest difficulty is the NAT setting, which does not follow the video content step-by-step for this step. In for me is the use of wireless to operate, and most of the personal pc in the Wireless link after the acquisition of IP is 192.168 network segment, when the virtual machine with the host in a NAT way to communicate, it is necessary to set up in different network segments, and I personally will change the VM to 172.20 of this network segment. Not affect the subsequent operation.
Java boot
For this part, because of the early in the R principle lesson, learned this piece of related set environment variables. It is possible to understand the principle of the virtual machine by setting it on Linux. and point the original boot to the relevant path.
The main difficulty in this part is the understanding of the Vim editor and the interpretation of the relevant commands. The biggest puzzle at the time was how to edit and save the exit after opening a file. Later by looking for Baidu, look at some technical posts only to gradually understand the use of Vim editor. There is the understanding of the principle, there is English is better, after encountering an error, to know where to find a solution, and practice implementation. In a place stuck, must be in a day to solve, otherwise to people's learning enthusiasm hit a lot.
Hadoop Build
In this section, the VIM command is the most. That is, how to set the relevant parameters under hadoop-1.2.1, and finally datanode,jobtrack,tasktrack,namenode whether these critical processes are up. And this part, it is after the format of Namenode, many times to start the service only up. This piece may involve the interconnection of three virtual machines and the transfer of parameters to each other. So this piece is the slowest. Another difficulty is that you have no public key file on your virtual machine (I don't have one). This needs to be built with touch, and the chmod command is used when checking user permissions. In short, this part of the content is a bit difficult, you need to be able to write a comprehensive vim command, while the relevant process of Hadoop know.
Summarize
Now the Python command, I think, theory and practice is really very different, continuous learning process, not only to overcome the inherent flaws in the code, but also to the kernel principle has a deeper understanding. Fortunately, the good habits that have been developed will record the operation of the work. Facilitate follow-up learning and understanding. You are also welcome to discuss with us.
This article is from the "Data Mining and Visualization" blog, reproduced please contact the author!
Big Data Learning Practice Summary (2)--Environment building, Java guidance, Hadoop building