Hadoop can be said to be quite a fire lately, and I'm interested in it, so I'm going to learn a little bit. To learn Hadoop, you must first learn to build a Hadoop pseudo-distributed environment on your computer. The first step in the pseudo-distribution mode installation step is to configure the Linux environment . My own Linux is Ubuntu system, but as long as t
Lists many configuration parameters.
The red configuration is required.
Parameter Value remarks
FS. Default. Name
The uri of the namenode.
HDFS: // host name/
DFS. hosts/dfs. Hosts. Exclude
List of allowed/denied datanode.
Use this file to control the licensed datanode list if necessary.
DFS. Replication
Default Value: 3
Dat
I. Preparatory work:1. JDK1.7 version and above (it seems that Hadoop only supports more than 1.6 versions, not sure, for the sake of 1.7, I use 1.8)2.2.7.3 version of Hadoop https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/download the one with more than 200 mTwo. Configure SSH password-free login1. Turn on
launch Eclipse, open window window-->preferences, Configure the installation path for Hadoop MapReduce , in the lab environment /usr/local/hadoop, as shown in: 2.2.5 Open the MapReduce view Click the Eclipse menu window-->show view-->other window, select mapreducer Locations as shown in the following: once added, a MapReduce view appears in the View area, with the Add button for the blue elephant
1. Copy the plugin Hadoop-eclipse-plugin-2.6.2.jar to the plugins in the Eclipse installation directory2. Restart EclipsePreparing Hadoop3. Enter Map/reduce view mode4. Storing data in a Hadoop distributed storage System5. Connect to Hadoop6. Create a Hadoop project7. Create a class Mywordcount.javaPackage Com.yc.hadoop;import Java.io.ioexception;import Org.apach
5_rpms_noarch-p $CDH 5_localpath-o $noarch _htmlwget-c $CDH 5_rpms_x86_64 -P $CDH 5_localpath-o $x 86_64_htmlwget-c $CDH _gpgkey-p $CDH 5_localpathwget-c $CDH _repo-p $CDH 5_localpath# Download re podata# cdh5_repodatarepodata_dir= $CDH 5_localpath "/5/repodata" mkdir-p $repodata _direcho-e "process file: ' $repodata _html ' "While the read Linedo # start with: The above script can be run multiple times and will not be downloaded repeatedly. Path_must_be_exsited inside the Cdh5 all content. Fin
The version of Hadoop that my cluster uses is hadoop-1.1.2. The corresponding eclipse version is also:hadoop-eclipse-plugin-1.1.2_20131021200005(1) Create a Hadoop-plugin folder under Eclipse's Dropins folder and put the plugin inside. Restart Eclipse again, open the view and the MapReduce view will appear(2) Configure host name, my hostname is
rationale is the same, but here is a list of host names that are forbidden to access the NN. This is useful for removing the DN from the cluster.
Dfs.max.objects
0
The number of Dfs maximum concurrent objects, the files in HDFs, and the directory blocks are considered to be an object. 0 means no Limit
Dfs.replication.interval
3
NN computes the internal interval of the copied block, usually without writing to the
The Hadoop environment has been set up in the previous chapters, this section focuses on building the spark platform on Hadoop 1 Download the required installation package
1) Download the spark installation package 2) Download the Scala installation package and unzip the installation package This example takes the following version as an example
2 Configuring environment variables
Use the command sudo ge
access the NN. This is useful for removing the DN from the cluster.
Dfs.max.objects
0
The number of Dfs maximum concurrent objects, the files in HDFs, and the directory blocks are considered to be an object. 0 means no Limit
Dfs.replication.interval
3
NN computes the internal interval of the copied block, usually without writing to the configuration file. The default is good
Dfs.support.append
t
http://10.18.51.52:9999/hwi/; This installs the Web browsing address for the configuration. Hive is Hadoop -based , so install and complete Hadoopfirst. Export Hive_home=/usr/hiveExport hive_conf_dir= $HOME/hive-confExport classpath= $HIVE _home/lib: $JAVA _home/lib: $JAVA _home/jre/lib: $HADOOP _homeExport path= $HIVE _home/bin: $
1. dfs. hosts records the list of machines that will be added to the cluster as datanode2. mapred. hosts records the list of machines that will be added to the cluster as tasktracker3. dfs. Hosts. Exclude mapred. Hosts. Exclude contains the list of machines to be removed.4. The master record the list of machines that run the auxiliary namenode.5. Slave records the list of machines running datanode and tasktracker6. hadoop-env.sh record the environment
When running a Hadoop program, the output directory specified by the program (such as output) cannot be present to prevent overwriting the result, otherwise an error is prompted, so the output directory needs to be deleted before running. When you actually develop your application, consider adding the following code to your program to automatically delete the output directory each time you run it, avoiding tedious command-line operations:
Reason: In the original computer Configuration pseudo-distribution, has been hostname and IP bindings, so when copied to another computer, when the restart will fail, because the new computer IP is not the same as the original computer IP! Because in a different network, in NAT mode, the IP of Linux is definitely located in different network segments!!Solution: Vi/etc/hosts The original computer's IP to the new computer's IP.Also: When reformatting
When we use the Linux Ubuntu system as the OS of the Hadoop node, we need to do some configuration on the Ubuntu OS. PS. (the following only operate in ubuntu14.04, other versions may differ)Installation using tools:sudo Install Vim sudo Install git sudo Install Subversion ...Common configuration:1. Add users (nave) and group (
A number of configuration parameters are listed
Where the red configuration is required to configure parameters
Parameters
Take value
Notes
Fs.default.name
The URI of the Namenode.
HDFS://Host name/
Dfs.hosts/dfs.hosts.exclude
License/Reject Datanode list.
If necessary, use this file to control the licensed Datanode list.
Dfs.replicati
installer will provide you with a separate dialog box for each disk, and it cannot read a valid partition table. Click the Ignore All button, or the Reinitialize All button, to apply the same answer to all devices.2.8 Setting host name and networkThe installer prompts you to provide and the domain name for this computer's hostname format, setting the hostname and domain name. Many networks have DHCP (Dynamic Host Configuration Protocol) services that
". Java.lang.NullPointerExceptionWe hadoop-eclipse-plugin-2.6.0.jar into Eclipse's plugins directory, our Eclipse directory is F:\tool\eclipse-jee-juno-SR2\ Eclipse-jee-juno-sr2\plugins, restart Eclipse, and then, open window-->preferens, you can see the Hadoop map/reduc option, and then click an internal Error occurredduring: "Map/reduce location Status Updater". Java.lang.NullPointerException,:Solve:We fo
Services:haddoop components that can be deployed on cluster, such as Hdfs,yarn,hbase.Roles: When the service is configured, it is created by Cloudera Manager. For example, Namenode is a role of the HDFs service.Role group: The management of role can divide the same category of roles (such as datanode) into different role groups. Each role group can have its own series of configurations.Role Instance: A single instance (which can be considered a process) that makes up the most basic service. An H
PriviledgedActionException as:man (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.2014-09-24 12:57:41,567 ERROR [RunService.java:206] - [thread-id:17 thread-name:Thread-6] threadId:17,Excpetion:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.frame
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.