When it comes to Hadoop has to say cloud computing, I am here to say the concept of cloud computing, in fact, Baidu Encyclopedia, I just copy over, so that my Hadoop blog content does not appear so monotonous, bone feeling. Cloud computing has been particularly hot this year, and I'm a beginner, writing down some of the experiences and processes I've taught myself about Hadoop.
Cloud computing (cloud computing) is an increase, use, and delivery model of internet-based related services, often involving the provision of dynamically scalable and often virtualized resources over the Internet. The cloud is a metaphor for the web and the Internet. In the past, the cloud was used to represent the telecommunications network, which was later used to represent the abstraction of the Internet and underlying infrastructure. As a result, cloud computing can even allow you to experience 10 trillion of computing power per second, with such powerful computational power to simulate nuclear explosions, predict climate change and market trends. Users through the computer, notebook, mobile phone and other means of access to the data center, according to their own needs for operation. There are many ways to define cloud computing. At least 100 explanations can be found for what cloud computing is. It is now widely accepted that the National Institute of Standards and Technology (NIST) defines: Cloud computing is a pay-as-you-go model that provides available, convenient, on-demand network access, into a configurable pool of computing resources (resources including networks, servers, storage, applications, services), These resources can be provided quickly, with little or no management effort, or with little interaction with service providers. (Above is copy Baidu Encyclopedia, do not spit groove Ah!)
)。
Speaking of cloud computing, let's say the concept of Hadoop, Hadoop is a distributed infrastructure developed by the Apache Foundation, which implements a distributed file system, which is the hdfs of subsequent articles. HDFs is characterized by high fault tolerance and is designed to be deployed on inexpensive hardware, and it provides high transmission rates to access application data for applications with very large datasets. HDFs relaxes the (relax) POSIX requirement to access data in the file system as a stream. The most central design of Hadoop's framework is that HDFS and MAPREDUCE.HDFS provide storage for massive amounts of data, and MapReduce provide computing for massive amounts of data.
Ok, too wordy, write so many concepts are from Baidu Encyclopedia excerpt down, I am tired of. Here's how to configure the deployment of the Hadoop environment, first of all to say that the current version of Hadoop has been to 2 version, but unfortunately, I as a beginner, I still prefer step learning, so today we are talking about HADOOP1 deployment and configuration, Later chapters are also for HADOOP1 version to learn.
Hadoop can be installed on many operating systems, such as Windows, Linux, Apple's Mac OS X. Since the current common production environment is the Linux operating system, we talk about how to configure the deployment of Hadoop under the Linux operating system in this chapter. There are three types of Hadoop deployments, namely stand-alone mode, pseudo distributed mode and distributed mode. Because of my Computer configuration is limited, can not virtual too many Linux environment, so today we only talk about the pseudo distributed deployment mode. As for stand-alone mode, this article does not tell, the next page I will talk about the configuration of distributed mode steps.
Preparation environment: Ubuntu12, Jdk-6u45-linux-i586.bin, hadoop-1.0.4.tar.gz (as a programmer, these people can get their hands on the official website, I do not give the download address ... If the reader sees here, may spit, Hadoop not only these Ah, hbase where to go, hive Data Warehouse where go? That's ridiculous. Oh, I want to say is, step by step, will the great God do not spit on me, I rookie one, the next chapter I will write these content, I made a protracted war of preparation ...
)
First go into the Ubuntu operating system and check to see if Ubuntu has the Java JDK installed.
OK, go to our Ubuntu terminal and enter Java-version to see if the sun's JDK is installed, as shown in figure:
OK, I am here to install the good, no JDK installed students, I went on to say.
Take our Jdk-6u45-linux-i586.bin file into the Ubuntu/home/zhaolong/, OK, I have a user, you follow your path. Installing the JDK in Linux is actually a process of extracting files, which must be authorized before installation.
The CD command enters the directory where our JDK files are located, that is, under/home/zhaolong/, type the command chmod 777 Jdk-6u45-linux-i586.bin.
After the file authorization is complete, we begin to extract the JDK files using the./jdk-6u45-linux-i586.bin command.
After the decompression is complete, we use the Vi. BASHRC command to enter the. bashrc file editing mode. We're going to configure the environment variables and enter the last line of the. bashrc file:
Export java_home=/home/zhaolong/jdk1.6.0_45 and export path=/bin: $JAVA _home/bin:/usr/bin: $PATH:.
Press ESC when the edit is complete and then enter: Wq save exit. Use. . BASHRC command to make it effective. This time we have to use java-version to verify that we have installed a successful JDK, there should be no accident should be the content of the above figure.
Next, we will configure SSH password-free login, if SSH has a password, each start each node to re-enter the password, this is a very troublesome thing.
First Use command sudo apt install ssh install SSH, the installation process needs networking, download a bit slow, be a bit patient ...
After the successful installation, enter the Ssh-keygen command, do not pay attention to the following content, all the way to enter, the end will be in the ~/.ssh/directory generated two files: Id_dsa and Id_rsa.pub, the two files are in pairs appear similar to the key and lock, and then the ID_ Dsa.pub files are copied to the Authorizde_keys in the ~/.ssh/directory: CP id_dsa.pub Authorizde_keys
SSH We install the configuration well, and then we install Hadoop. Copy the downloaded Hadoop file hadoop-1.0.4.tar.gz to/home/zhaolong/. As with the installation of JDK, we want to set the permissions of the file, or can not extract, set the permissions of the hadoop-1.0.4.tar.gz file, the command is: chmod 777 hadoop-1.0.4.tar.gz
Unzip the hadoop-1.0.4.tar.gz file, the command is: Tar xzvf hadoop-1.0.4.tar.gz
After the decompression is complete, add the Hadoop bin to the environment variable. The exact operation remains the $PATH environment variable that modifies the bottom of the. bashrc file. Operation and the above configuration JDK environment variables, I will not repeat.
After the Hadoop environment variable is configured, we need to configure some Hadoop configuration files, we need to go into the hadoop/conf directory,
1, modify: hadoop-env.sh configuration file, the contents of the file increased
Export java_home=/home/zhaolong/jdk1.6.0_45
2, modify: Core-site.xml configuration file, the contents of the file increased
Fs.default.name
hdfs://localhost:9000
Hadoop.tmp.dir
/hadoop
3, modify: Hdfs-site.xml configuration file, the contents of the file increased
Dfs.replication
1
4, modify: Mapred-site.xml configuration file, the contents of the file increased
Mapred.job.tracker
localhost:9001
After all 4 steps have been configured, we need to start Hadoop to verify that our environment has been successfully deployed, and that the file system must be formatted before it can be started, but the format file system will not need to reformat the file system until the first time it is started. The command for the format file system is: Hadoop Namenode-format
File system formatting needs to wait for a while, but not very long, after formatting is complete, we can try to start Hadoop, in order to facilitate our start-all.sh command to start all services.
We started the service.
All right, let's talk about the common commands that Hadoop starts and stops,
(1) Format file system, command: Hadoop namenode-format
(2) Start off all services: start-all.sh/stop-all.sh
(3) Start off hdfs:start-dfs.sh/stop-dfs.sh
(4) Start off mapreduce:start-mapred.sh/stop-mapred.sh
(5) Use the JPS command to view the process to ensure that there are namenode,datanode,jobtracker
(6) Job Tracker Admin Interface: http://localhost:50030
(7) HDFs Management Interface: http://localhost:50070
(8) HDFS Communication port: 9000
(9) MapReduce Communication port: 9001
First write to this, late at night, it's time to sleep ... Write more general, let's do it, goodnight ...
Original link: http://blog.csdn.net/cooldragon_x/article/details/37775079