Hadoop 2.8.x Distributed Storage HDFs basic features, Java sample connection HDFs

Source: Internet
Author: User
Tags hdfs dfs hadoop fs

02_note_ Distributed File System HDFS principle and operation, HDFS API programming; 2.x under HDFS new features, high availability, federated, snapshot

HDFS Basic Features

/home/henry/app/hadoop-2.8.1/tmp/dfs/name/current-on namenode

Cat./version

Namespaceid (spatial identification number, similar to cluster identification number)

/home/henry/app/hadoop-2.8.1/tmp/dfs/data –on Datanode

Ls-lr

BLK_1073741844XX (data block, typical data block size is 64M)

/home/henry/app/hadoop-2.8.1/etc/hadoop/hdfs-site.xml

<name>dfs.replication</name> (set number of block replicas, all blocks have replicas)

Rack-aware functionality (all other racks in this rack have replicas to avoid entire rack failures and improve network efficiency)

Configuration in Core-site.xml

<name>topology.script.file.name</name> (Specify rack-aware script file rackaware.py)

<name>topology.script.number.args</name> (Specify the number of rack servers)

Heartbeat mechanism

Datanode regularly sends block report to Namenode

If the signal is not received periodically, the Namenode will mark the Datanode down, will not give IO requirements, if the datanode fails to cause the number of copies to fall, and lower than the pre-set threshold, Namenode will be re-copied at the appropriate time Daemons line

Safe Mode

Namenode "Safe Mode" when booting

The data block is collected during Safe mode, and when the data block with insufficient number of replicas is detected, the minimum number of replicas is replicated to end the safe mode

Hadoop Dfsadmin-safemode Enter-Force in Safe mode

Hadoop Fs-put and other additions and deletions in Safe mode will be error

Hadoop Dfsadmin-safemode Leave-off Safe mode

Checksum

Blk_1073741844xx.meta (When a file is created, each data block generates a CRC checksum stored in a Blk-xxx.meta file with the same name, and the data is checked to determine if the block is damaged)

Open Recycle Bin

Configuration in Core-site.xml

<name>fs.trash.interval</name>

<value>10080</value>-time threshold (in minutes),-rm deleted files are placed under. Trash, after threshold time is deleted

Except, 0 is disabled

Recover files,-mv to the appropriate directory

-expunge, empty the Recycle Bin

Hadoop fs-rm-r-skiptrash-will be deleted directly without Tash

By default, only Hadoop shells are removed to trash, and other programmatic operations do not move to trash, such as Webhdfs,java API, unless the code is written to the trash functionality

Meta Data protection

Image files and transaction logs are Namenode core data that can be configured as multiple replicas

Enhanced security, but reduced namenode processing speed

Namenode is still a single point, the fault needs to be manually switched 2nd Namenode

Snapshot Snapshot feature -2.x support xxx critical requirements XXX

(http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)

Snapshot is not a total backup and needs to be created for a directory snapshot the directory HDFs dfsadmin-allowsnapshot./in (Opens the./in directory)

HDFs dfsadmin-allowsnapshot <path>-a folder that needs to be backed up to open multiple path

HDFs Dfsadmin-disallowsnapshot <path>

HDFs dfs-createsnapshot <path> [<snapshotname>]

<snapshotname>optional parameter, not set Defaul with the format "' s ' yyyymmdd-hhmmss.sss", e.g. "s20130412-151029.033"

HDFs dfs-deletesnapshot <path> <snapshotName>

HDFs dfs-renamesnapshot <path> <oldName> <newName>

HDFs Lssnapshottabledir-Get all open snapshot directories under current user rights

HDFs dfs-ls./in/.snapshot-Access snapshot need to be/.snapshot behind

HDFs Snapshotdiff <path> <fromSnapshot> <toSnapshot>-Get Change records for different snapshot

                                                                   +               The file/directory has been created.

                                                                  -                The file/directory has been deleted.

                                                                   m             the file/directory has been modified.

R the File/directory has been renamed.

Block Pool (concept of blocks)

HDFs Federation (HDFs Federation)

Not ha, not mutual redundancy, namenode failure can result in some data inaccessible, similar to the concept of puzzle

Larger clusters, encountering Namenode performance bottlenecks will require

Hdfs-site.xml configuration (multiple Namenode)

Formatting multiple Namenode

HDFs Namenode-format[-clusterid <cluster_id>] > First Namenode,clusterid optional, automatically generated

HDFs Namenode-format-clusterid <cluster_id> > Second Namenode, you must specify Clusterid to bind to the first namenode in the same federation

2.x supports multiple namenode to distribute load and achieve performance assurance

Namespace Management-Client Side Mount Table

Add a new Datanode node

Install Hadoop on new datanode and copy config from Namenode

Update Masters and slaves file on all Namenode and Datanode

Config no pwd access

Start Datanode and NodeManager

start-balancer.sh Load Balancing

Java connection HDFS, Run urlcat.java sample program

echo $HADOOP _classpath return hadoop-env.sh default CLASSPATH configuration and copy new Hadoop_classpath to/etc/profile

Keep your. Class path added to Hadoop_classpath source/etc/profile

Need to configure Hadoop_classpath, put the path used to put the. class file in (/home/henry/myjava/class/), or run the compiled program will error "Could not find or load main Class

The Hadoop_classpath can be configured in hadoop_home/etc/hadoop/hadoop_env.sh or directly in/etc/profile

Hadoop classpath > Retrieving the current Hadoop_classpath configuration

CD ~/myjava

JAVAC-CP. /app/hadoop-2.8.1/share/hadoop/common/hadoop-common-2.8.1.jar urlcat.java-d./class/

JAVAC-CP $HADOOP _classpath urlcat.java-d./class/(because the associated. jar package is already configured to Hadoop_classpath, can be called directly)

Javac compile the time must add the configured class path, such as-D./class/, is responsible for the runtime can not find the error, because the. class path has been configured to Hadoop_classpath

1) Hadoop Urlcat hdfs://master.henry:9000/user/henry/in/readme.txt (need to configure the. Class path to Hadoop_classpath, Can be run under any path after compilation)

JAR-CVF Urlcat.jar

2) Hadoop jar./urlcat.jar urlcat hdfs://master.henry:9000/user/henry/in/readme.txt

No need to configure the. Class path to Hadoop_classpath

However, you must run it under the path of the. class, otherwise you cannot identify the. class file

It's written under Eclipse. Java code needs to remove the package name and then compile it into a class file (or direct Linux under Javac Compilation) and then move it to Linux and package it into a jar file.

Modify Java files using UTF-8 encoding, or Linux compile garbled error

Menu navigation bar window-->preferences Open the Preferences dialog box, left navigation tree, navigate to General-->workspace

Modify the Java file encoding for a single item to Utf-8,package or right-click Properties>resource>text file Encoding

Javac-encoding Utf-8 (Specify compilation using UTF-8 encoding)

Ant Use

Download and tar apache_ant

Add Ant_home and Bin To/etc/profile

Ant-version OK

Copy Mapreduce_cookbook_code to Mater

Random access to the source directory, mainly have. src directory (storage. java) and Build.xml configuration files

Build.xml-Configuration files (such as the path of the jar package used for compiling, compiling the source directory, after compiling the. class placement path, etc.)

The Execute Ant command generates a build directory directly (holding the. class file)-configured in Build.xml

Hadoop 2.8.x Distributed Storage HDFs basic features, Java sample connection HDFs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.