Install and deploy Apache Hadoop 2.6.0
Note: For this document, refer to the official documentation for the original article.
1. hardware environment
There are three machines in total, all of which use the linux system. Java uses jdk1.6.0. The configuration is as follows:Hadoop1.example.com: 172.20.115.1 (NameNode)Hadoop2.example.com: 172.20.1152 (DataNode)Hadoop3.example.com: 172.115.20.3 (DataNode)Hadoop4.example.com: 172.20.115.4Correct resolution
High-availability Hadoop platform-Hadoop Scheduling for Oozie Workflow1. Overview
In the "high-availability Hadoop platform-Oozie Workflow" article, I will share with you how to integrate a single plug-in such as Oozie. Today, we will show you how to use Oozie to create related workflows for running and Hadoop. You mu
Several Hadoop daemon and Hadoop daemon
After Hadoop is installed, several processes will appear when jps is used.
Master has:
Namenode
SecondaryNameNode
JobTracker
Slaves has
Tasktracker
Datanode
1.NameNode
It is the master server in Hadoop, managing the file system namespace and accessing the files stored in the
resourcesMaster-Slave structureMaster node, there can be 2: ResourceManagerFrom the node, there are a number of: NodeManagerResourceManager is responsible for:Allocation and scheduling of cluster resourcesFor applications such as MapReduce, Storm, and Spark, the Applicationmaster interface must be implemented to be managed by RMNodeManager is responsible for:Management of single node resourcesVII: The architecture of MapReduceBatch computing model with disk IO dependentMaster-Slave structureMas
Big data: Massive dataStructured data: Data that can be stored in a two-dimensional tableunstructured data: Data cannot be represented using two-dimensional logic of the data. such as word,ppt, picture Semi-structured data: a self-describing, structured and unstructured data that stores the structure with the data itself: XML, JSON, HTMLGoole paper: mapreduce:simplified Date processing on Large Clusters Map: Small data that maps big data to multiple nodes that are segmented
First on the correct run display:Error 1: The variable is intwritable and is receiving longwritable, such as:Reason, write more parameters reporter, such as:Error 2: The array is out of bounds, such as:Cause: The Combine class is set up, such as:Error 3:nullpointerexception exception, such as:Cause: The static variable is null and can be assigned, such as:Error 4: Entering map, but unable to enter reduce, and direct map data output, and no error promptCause: The new and older version of
1 access to Apache Hadoop websitehttp://hadoop.apache.org/2.2. Click image to downloadWe download the 2.6.0 third in the stable version of stableLinux Download , here is an error, we download should be the bottom of the second, which I did not pay attention to download the above 17m .3. Install a Linux in the virtual machineFor details see other4. Installing the Hadoop environment in Linux1. Installing the
Build a Hadoop development environment for Fedora 20
1. configuration information:
Operating System: fedora 20X86
Eclipse version: eclipse-jee-helios-SR2-linux-gtk.tar.gz (preferably use Galileo or Helios, otherwise there may be compatibility issues)
Hadoop version: hadoop-1.1.2.tar.gz
Ant: apache-ant-1.9.3-bin.tar.gz
2. Compile the
First, ready to run the required jar package1) Avro-1.7.4.jar2) Commons-cli-1.2.jar3) Commons-codec-1.4.jar4) Commons-collections-3.2.1.jar5) Commons-compress-1.4.1.jar6) Commons-configuration-1.6.jar7) Commons-io-2.4.jar8) Commons-lang-2.6.jar9) Commons-logging-1.2.jar) Commons-math3-3.1.1.jarOne) Commons-net-3.1.jarCurator-client-2.7.1.jar)Curator-recipes-2.7.1.jar)Gson-2.2.4.jar)Guava-20.0.jar)Hadoop-annotations-2.8.0.jar)
When Hadoop was started today, it was discovered that Datanode could not boot, and the following errors were found in the View log: Java.io.ioexception:file/opt/hadoop/tmp/mapred/system/jobtracker.info could only is replicated to 0 nodes, instead o F 1 at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (fsnamesystem.java:1271) at Org.apache.hadoop.hdfs.server.namenode.NameNode.addBl
This morning, I helped a new person remotely build a hadoop cluster (1. in versions X or earlier than 0.22), I am deeply touched. Here I will write down the simplest Apache hadoop construction method and provide help to new users. I will try my best to explain it in detail. Click here to view the avatorhadoop construction steps.
1. Environment preparation:
1 ). machine preparation: the target machine must b
What is hadoop?
Before doing something, the first step is to know what, then why, and finally how ). However, after many years of project development, many developers get used to how first, then what, and finally why. This will only make them impetuous, at the same time, technologies are often misused in unsuitable scenarios.
The core designs in the hadoop framework are mapreduce and HDFS. The idea of mapre
Hadoop FS: The widest range of users can operate any file system.
Hadoop DFS and HDFs dfs: only HDFs file system related (including operations with local FS) can be manipulated, the former has been deprecated, generally using the latter.
The following reference from StackOverflow
Following are the three commands which appears same but have minute differences Hadoop
processing results ==============>> mapreduce !!!
2. Basic Node
Hadoop has the following five types of nodes:
(1) jobtracker
(2) tasktracker
(3) namenode
(4) datanode
(5) secondarynamenode
3. Fragmentation theory
(1) hadoop divides mapreduce input into fixed-size slices, which are called input split. In most cases, the slice size is equal to the HDFS block size (64 MB by default ).
(2)
4. Local data is
Run Hadoop WordCount. jar in Linux.
Run Hadoop WordCount in Linux
Enter the shortcut key of Ubuntu terminal: ctrl + Alt + t
Hadoop launch command: start-all.sh
The normal execution results are as follows:
Hadoop @ HADOOP :~ $ Start-all.sh
Warning: $ HADOOP_HOME is deprecate
Inkfish original, do not reprint commercial nature, reproduced please indicate the source (http://blog.csdn.net/inkfish).
Hadoop is an open source cloud computing platform project under the Apache Foundation. Currently the latest version is Hadoop 0.20.1. The following is a blueprint for Hadoop 0.20.1, which describes how to install
Currently in Hadoop used more than lzo,gzip,snappy,bzip2 these 4 kinds of compression format, the author based on practical experience to introduce the advantages and disadvantages of these 4 compression formats and application scenarios, so that we in practice according to the actual situation to choose different compression format.
1 gzip compression
Advantages: The compression ratio is high, and the compression/decompression speed is relatively fas
to facilitate the MapReduce direct access to the relational database (mysql,oracle). Hadoop offers two classes of Dbinputformat and Dboutputformat. Through the Dbinputformat class, the database table data is read into HDFs, and the result set generated by MapReduce is imported into the database table according to the Dboutputformat class.error when executing mapreduce: java.io.IOException:com.mysql.jdbc.Driver, usually because the program cannot find
First, compile the Hadoop pluginFirst you need to compile the Hadoop plugin: Hadoop-eclipse-plugin-2.6.0.jar Before you can install it. Third-party compilation tutorial: Https://github.com/winghc/hadoop2x-eclipse-pluginIi. placing plugins and restarting eclipsePut the compiled plugin Hadoop-eclipse-plugin-2.6.0.jar int
Now that namenode and datanode1 are available, add the node datanode2 first step: Modify the Host Name of the node to be added hadoop @ datanode1 :~ $ Vimetchostnamedatanode2 Step 2: Modify the host file hadoop @ datanode1 :~ $ Vimetchosts192.168.8.4datanode2127.0.0.1localhost127.0
Now that namenode and datanode1 are available, add the node datanode2 first step: Modify the Host Name of the node to be added
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.