Introduction and installation of 1.0 Hadoop-hdfs

Source: Internet
Author: User
Tags hdfs dfs metadate
HDFS Distributed Storage systems (delivers high reliability, high scalability and high throughput data storage services) HDFS Advantages: High fault tolerant data automatically save multiple copies, after the loss of replicas, automatic recovery for batch processing mobile computing rather than data, data location exposed to the computing framework for large data processing can be built on the cheap machine HDFS Disadvantage: Low latency data access such as millisecond, low latency and high throughput small file access occupies Namenode large amount of memory, seek time more than read time concurrent write, file random modification best not to modify Hdfs:namenode | Secondary Namenode | Datanode (Block)
Namenode Main function: Accept the client's read-write service Namenode save Metadate (source information) –namenode metadate information is loaded into memory after startup metadata stored to disk file named "Fsimage" The location information for the block is not saved to the Fsimage edits records the action log to metadata Secondarynamenode (SNN) It is not a backup of NN (but it can be backed up), its main job is to help nn merge edits log, Reduce NN startup time SNN execution merging time fs.checkpoint.period default 3,600 seconds based on profile settings edits log size fs.checkpoint.size rules
Datanode Storage data (block) to start the DN thread will report to nn blocks information by sending the heartbeat to NN to maintain its contact (3 seconds), if nn 10 minutes did not receive the heartbeat of the DN, then think it has been lost, and copy it on the block to other DN By default, Block plus itself has three copies, the best size not more than 1G HDFs support 2 kinds of authentication: simple only authenticated Users, do not authenticate passwords, the default use of Kerberos authentication user also authentication password, but to add the machine, the new machine username password invalid HDFS Namenode automatically enter the Safe mode when the file is read-only HDFS installation: Prerequisite three machines (above) time is consistent, the difference within 30 seconds. Must have host name and IP mapping. --hdfs only recognize the host name, do not recognize IP must have JDK1.7, and JDK environment variables must be configured well. Configuration environment variable: VI ~/.bash_profile #全局变量:/etc/profile at the end of the file add: Export Java_home=/usr/java/default export path= $PATH: $JAVA _ Home/bin source ~/.bash_profile Refresh environment variable file firewall temporarily shut down. Upload tar and unzip (TAR-ZXVF tar package name). and configure the environment variable of HADOOP export hadoop_home=/opt/local/hadoop-2.5.2 export path= $PATH: $HADOOP _home/bin: $HADOOP _home/sbin Edit Hadoop configuration file/etc/hadoop:hadoop-env.sh core-site.xml hdfs-site.xml sleves
hadoop-env.sh Change 25th Line export Java_home=/usr/java/default core-site.xml     ---main node   Namenode & lt;property>         <name>fs.defaultFS</name>          <value>hdfs://node1:9000</value>     </property> <property>         <name>hadoop.tmp.dir</name>          <value>/hadoop</value> </property>   hdfs-site.xml       --secondary Namenode <property>    <name> dfs.namenode.secondary.http-address</name> <value> node2:50090</value> </property> < property> <name>dfs.namenode.secondary.https-address</name> <value> Node2:50091</value > </property>   Slaves All datanode hostname--Set all host names as Datanode nodes Node1 node2 node3 set sshd password-free login. Find a Master node: Start the service. Executes the command generation key. ssh-keygen     (cd/root        ls–al     &NBSP;&NBSP;CD. SSH) copies the public key of the master node to all nodes. Ssh-copy-id-i id_rsa.pub Root@node2 Node2 can easily modify Node1 node2 (all password-free servers) copy files:hosts,bash_profile  Hadoop directory to several other machines Format Hdfs:bin/hdfs Namenode–format must be started on the master node on the primary node after the sbin/start-dfs.sh   boot: JPS display machine node name   http://localhost:500 70/    localhost to namenode ID page access Port 50070 other Access Ports 9000 HDFs Dfs-ls/view HDFs root directory There is no folder HDFs Dfs-mkdir/home Create a home folder under the root directory HDFs dfs-put apache-tomcat-7.0.61.tar.gz/home/to upload Apache to disk home HDFs DFS view Help documentation HDFs DFS-CHOWN-R Zha Ngsan/test Modify the Test folder permissions to Zhangsan hadoop-deamon.sh Restart Datanode Restart this datanode node hadoop-deamons.sh restart Datanode reboot All Datanode nodes-typically do not use eclipse to access the HDFs service:  d:\java tool \eclipse\plugins plus Hadoop-eclipse-plugin-2.5.1.jar, then reboot and then upper right corner- "Open perspective find map/reduce below find map/reduce locations lower right corner: New Hadoop location Select Dfs Master host: Own nodename IPPort 9000 Location Name: Random  eclipse Programming: Note that Windows user attention is modified to root

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.