HDFS cluster contains a namenode (masterserver) to manage the file system namespace and control client access files. In addition, a cluster also includes many datanode, which manage the storage of the nodes where they are located. HDFS exposes a file system namespace that allows users to store data in files. Internally, a file is divided into one or more blocks, which are stored in a series of datanode. Na
1. There is a block on the blocks hard disk, which represents the smallest data unit that can be read and written, usually 512 bytes. A file system based on a single hard disk also has the concept of block. Generally, a group of blocks on the hard disk are combined into a block, which is usually several kb in size. These are transparent to users of the file system. Users only know that they have written a certain size of files to the hard disk or read a certain size of files from the hard disk.
it locally. Therefore, Datanode can be pipelined to receive data from the previous node, and at the same time forward to the next node, the data in a pipelined way from the previous Datanode copy to the next. AccessibilityHDFS provides multiple ways to access your app. Users can access through the Java API interface, or through the C-language encapsulation API, and can access the files in HDFs through a browser. Access through the WebDAV protocol i
Introduction
Hadoop Distributed File System (HDFS) is a distributed file system designed for running on commercial hardware. It has many similarities with the existing distributed file system. However, it is very different from other distributed file systems. HDFS is highly fault tolerant and intended to be deployed on low-cost hardware. HDFS provides high-throug
bin/hadoop Dfs–rmr/foodirSee a file content/foodir/myfile.txt Bin/hadoop dfs–cat/foodir/myfile.txtThe FS Shell is primarily used by scripting languages to interact with stored data.DfsadminThe Dfsadmin command primarily manages the HDFs cluster. These commands are used by HDFS administrators.Put the cluster in Safe mode Bin/hadoop Dfsadmin–safemode EnterGenerate
Use this command bin/Hadoop fs-cat to read the file content on HDFS to the console.
You can also use HDFS APIs to read data. As follows:
Import java.net. URI;Import java. io. InputStream;Import org. apache. hadoop. conf. Configuration;Import org. apache. hadoop. fs. FileSystem;Import org. apache. hadoop. fs. Path;Import org. apache. hadoop. io. IOUtils;Public class FileCat{Public static void main (String []
You can use the command line bin/Hadoop fs-rm (r) to delete files (folders) on hdfs)
You can also use HDFS APIs. As follows:
Import java.net. URI;Import org. apache. hadoop. conf. Configuration;Import org. apache. hadoop. fs. FileSystem;Import org. apache. hadoop. fs. Path;Public class FileDelete{Public static void main (String [] args) throws Exception{If (args. length! = 1 ){System. out. println ("Usage
now let's take a closer look at the FileSystem class for Hadoop. This class is used to interact with Hadoop's file system. While we are mainly targeting HDFS here, we should let our code use only abstract class filesystem so that our code can interact with any Hadoop file system. When we write the test code, we can test it with the local file system, use HDFs when deploying, just configure it, no need to mo
quickly. Federation the entire core design was implemented for about 4 months. Most of the changes are in Datanode, config, and tools, and the Namenode itselfThe changes are minimal so that Namenode's original robustness will not be affected. This makes the scheme compatible with previous versions of HDFs. For horizontal expansion, Namenode,federation uses multiple independent namenode/namespace. These namenode are combined,That is, they are independ
data, which consists of four parts, the HDFs Client, NameNode, Datanode, and secondary NameNode, respectively. Here we introduce the four components separately.
No
Role
Function description
1
Client: Clients
File segmentation. When uploading an HDFS file, the Client divides the file into one block and then stores it.
Interact with NameNo
file. Because Dfs.replication is essentially a client parameter, you can specify a specific replication when you create a file, and the property dfs.replication is the default backup number when you do not specify a specific replication. After the file is uploaded, the number of backups has been set, and modifying the dfs.replication will not affect the previous files, nor will it affect the files that are specified after the backup number. Affects only files that follow the default backup coun
02_note_ Distributed File System HDFS principle and operation, HDFS API programming; 2.x under HDFS new features, high availability, federated, snapshotHDFS Basic Features/home/henry/app/hadoop-2.8.1/tmp/dfs/name/current-on namenodeCat./versionNamespaceid (spatial identification number, similar to cluster identification number)/home/henry/app/hadoop-2.8.1/tmp/dfs
Label: style blog HTTP color Io Java strong SP File
Copy Mechanism
1. Copy placement policy
The first copy is placed on the datanode of the uploaded file. If it is submitted outside the cluster, a node with a low disk speed and a low CPU usage will be randomly selected;The second copy is placed on nodes in different racks of the first copy;Third copy: different nodes in the same rack as the second copy;If there are more copies: randomly placed in the node;
2. Copy Coefficient
1) Whe
. Previously, only hdfs storage can be horizontally expanded, and namenode can also be used to reduce the memory and service pressure of a single namenode.
2. Performance. Multiple namenode can increase the read/write throughput.
3. Isolation. Isolate different types of programs to control resource allocation to a certain extent.
Federation Configuration:
The federated configuration is backward compatible and allows the current single-node environment
complete the unfinished part of the previous section, and then analyze the internal principle of the HDFs read-write file.Enumerating FilesThe Liststatus () method of the FileSystem (Org.apache.hadoop.fs.FileSystem) can list the contents of a directory.Public filestatus[] Liststatus (Path f) throws FileNotFoundException, Ioexception;public filestatus[] Liststatus (Path[] files) throws FileNotFoundException, Ioexception;public filestatus[] Liststatus (
Not much to say, directly on the code.CodePackage zhouls.bigdata.myWholeHadoop.HDFS.hdfs5;Import java.io.IOException;Import Java.net.URI;Import java.net.URISyntaxException;Import org.apache.hadoop.conf.Configuration;Import Org.apache.hadoop.fs.FileSystem;Import Org.apache.hadoop.fs.Path;/**** @author* @function Copying from the Local file system to HDFS**/public class Copyinglocalfiletohdfs{/*** @function Main () method* @param args* @throws IOExcepti
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.