02_note_ Distributed File System HDFS principle and operation, HDFS API programming; 2.x under HDFS new features, high availability, federated, snapshot
HDFS Basic Features
/home/henry/app/hadoop-2.8.1/tmp/dfs/name/current-on namenode
Cat./version
Namespaceid (spatial identification number, similar to cluster identification number)
/home/henry/app/hadoop-2.8.1/tmp/dfs/data –on Datanode
Ls-lr
BLK_1073741844XX (data block, typical data block size is 64M)
/home/henry/app/hadoop-2.8.1/etc/hadoop/hdfs-site.xml
<name>dfs.replication</name> (set number of block replicas, all blocks have replicas)
Rack-aware functionality (all other racks in this rack have replicas to avoid entire rack failures and improve network efficiency)
Configuration in Core-site.xml
<name>topology.script.file.name</name> (Specify rack-aware script file rackaware.py)
<name>topology.script.number.args</name> (Specify the number of rack servers)
Heartbeat mechanism
Datanode regularly sends block report to Namenode
If the signal is not received periodically, the Namenode will mark the Datanode down, will not give IO requirements, if the datanode fails to cause the number of copies to fall, and lower than the pre-set threshold, Namenode will be re-copied at the appropriate time Daemons line
Safe Mode
Namenode "Safe Mode" when booting
The data block is collected during Safe mode, and when the data block with insufficient number of replicas is detected, the minimum number of replicas is replicated to end the safe mode
Hadoop Dfsadmin-safemode Enter-Force in Safe mode
Hadoop Fs-put and other additions and deletions in Safe mode will be error
Hadoop Dfsadmin-safemode Leave-off Safe mode
Checksum
Blk_1073741844xx.meta (When a file is created, each data block generates a CRC checksum stored in a Blk-xxx.meta file with the same name, and the data is checked to determine if the block is damaged)
Open Recycle Bin
Configuration in Core-site.xml
<name>fs.trash.interval</name>
<value>10080</value>-time threshold (in minutes),-rm deleted files are placed under. Trash, after threshold time is deleted
Except, 0 is disabled
Recover files,-mv to the appropriate directory
-expunge, empty the Recycle Bin
Hadoop fs-rm-r-skiptrash-will be deleted directly without Tash
By default, only Hadoop shells are removed to trash, and other programmatic operations do not move to trash, such as Webhdfs,java API, unless the code is written to the trash functionality
Meta Data protection
Image files and transaction logs are Namenode core data that can be configured as multiple replicas
Enhanced security, but reduced namenode processing speed
Namenode is still a single point, the fault needs to be manually switched 2nd Namenode
Snapshot Snapshot feature -2.x support xxx critical requirements XXX
(http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
Snapshot is not a total backup and needs to be created for a directory snapshot the directory HDFs dfsadmin-allowsnapshot./in (Opens the./in directory)
HDFs dfsadmin-allowsnapshot <path>-a folder that needs to be backed up to open multiple path
HDFs Dfsadmin-disallowsnapshot <path>
HDFs dfs-createsnapshot <path> [<snapshotname>]
<snapshotname>optional parameter, not set Defaul with the format "' s ' yyyymmdd-hhmmss.sss", e.g. "s20130412-151029.033"
HDFs dfs-deletesnapshot <path> <snapshotName>
HDFs dfs-renamesnapshot <path> <oldName> <newName>
HDFs Lssnapshottabledir-Get all open snapshot directories under current user rights
HDFs dfs-ls./in/.snapshot-Access snapshot need to be/.snapshot behind
HDFs Snapshotdiff <path> <fromSnapshot> <toSnapshot>-Get Change records for different snapshot
+ The file/directory has been created.
- The file/directory has been deleted.
m the file/directory has been modified.
R the File/directory has been renamed.
Block Pool (concept of blocks)
HDFs Federation (HDFs Federation)
Not ha, not mutual redundancy, namenode failure can result in some data inaccessible, similar to the concept of puzzle
Larger clusters, encountering Namenode performance bottlenecks will require
Hdfs-site.xml configuration (multiple Namenode)
Formatting multiple Namenode
HDFs Namenode-format[-clusterid <cluster_id>] > First Namenode,clusterid optional, automatically generated
HDFs Namenode-format-clusterid <cluster_id> > Second Namenode, you must specify Clusterid to bind to the first namenode in the same federation
2.x supports multiple namenode to distribute load and achieve performance assurance
Namespace Management-Client Side Mount Table
Add a new Datanode node
Install Hadoop on new datanode and copy config from Namenode
Update Masters and slaves file on all Namenode and Datanode
Config no pwd access
Start Datanode and NodeManager
start-balancer.sh Load Balancing
Java connection HDFS, Run urlcat.java sample program
echo $HADOOP _classpath return hadoop-env.sh default CLASSPATH configuration and copy new Hadoop_classpath to/etc/profile
Keep your. Class path added to Hadoop_classpath source/etc/profile
Need to configure Hadoop_classpath, put the path used to put the. class file in (/home/henry/myjava/class/), or run the compiled program will error "Could not find or load main Class
The Hadoop_classpath can be configured in hadoop_home/etc/hadoop/hadoop_env.sh or directly in/etc/profile
Hadoop classpath > Retrieving the current Hadoop_classpath configuration
CD ~/myjava
JAVAC-CP. /app/hadoop-2.8.1/share/hadoop/common/hadoop-common-2.8.1.jar urlcat.java-d./class/
JAVAC-CP $HADOOP _classpath urlcat.java-d./class/(because the associated. jar package is already configured to Hadoop_classpath, can be called directly)
Javac compile the time must add the configured class path, such as-D./class/, is responsible for the runtime can not find the error, because the. class path has been configured to Hadoop_classpath
1) Hadoop Urlcat hdfs://master.henry:9000/user/henry/in/readme.txt (need to configure the. Class path to Hadoop_classpath, Can be run under any path after compilation)
JAR-CVF Urlcat.jar
2) Hadoop jar./urlcat.jar urlcat hdfs://master.henry:9000/user/henry/in/readme.txt
No need to configure the. Class path to Hadoop_classpath
However, you must run it under the path of the. class, otherwise you cannot identify the. class file
It's written under Eclipse. Java code needs to remove the package name and then compile it into a class file (or direct Linux under Javac Compilation) and then move it to Linux and package it into a jar file.
Modify Java files using UTF-8 encoding, or Linux compile garbled error
Menu navigation bar window-->preferences Open the Preferences dialog box, left navigation tree, navigate to General-->workspace
Modify the Java file encoding for a single item to Utf-8,package or right-click Properties>resource>text file Encoding
Javac-encoding Utf-8 (Specify compilation using UTF-8 encoding)
Ant Use
Download and tar apache_ant
Add Ant_home and Bin To/etc/profile
Ant-version OK
Copy Mapreduce_cookbook_code to Mater
Random access to the source directory, mainly have. src directory (storage. java) and Build.xml configuration files
Build.xml-Configuration files (such as the path of the jar package used for compiling, compiling the source directory, after compiling the. class placement path, etc.)
The Execute Ant command generates a build directory directly (holding the. class file)-configured in Build.xml
Hadoop 2.8.x Distributed Storage HDFs basic features, Java sample connection HDFs