Book learning-dong sicheng's hadoop technology insider in-depth analysis of hadoop common and HDFS Architecture Design and Implementation Principles
High Fault Tolerance and scalability of HDFS
Lucene is an engine development kit that provides a pure Java high-performance full-text search that can be easily embedded in
1. The purpose of this articleUnderstand some of the features and concepts of the HDFS system for Hadoop by parsing the client-created file flow.2. Key Concepts2.1 NameNode (NN):HDFs System core components, responsible for the Distributed File System namespace management, Inode table file mapping management. If the backup/recovery/federation mode is not turned on
I. Basic concepts of HDFS
1.1. Data blocks)
HDFS (Hadoop Distributed File System) uses 64 mb data blocks by default.
Similar to common file systems, HDFS files are divided into 64 mb data block storage.
In HDFS, if a file is smaller than the size of a data block, it does
can store. It also eliminates concerns about metadata, because blocks are only part of the data stored, and the metadata of the file, such as county information, does not need to be stored with the block, so that other systems can manage the metadata separately.And blocks are well suited for data backup to provide data fault tolerance and availability. Copying each block to a few separate machines (by default, 3) ensures that data is not lost after a block, disk, or machine failure occurs. If a
(getboolean) int (getint) Long (getlong) float (getfloat) string (get) file (GetFile) string Array (getstrings, where values are separated by commas) Merge resources: Configuration conf = new configuration () Conf. addresource (core-default.xml "); Conf. addresource (core-site.xml "); If the configuration item is not marked as final, the subsequent configuration will overwrite the previous configuration. If there is final, there will be a warning when overwriting. Property extension: The
node cluster address, separated by semicolons:
The client failover proxy class, which currently provides only one implementation:
Edit Log Save path:
Fencing Method Configuration:
While using QJM as a shared storage, there is no simultaneous brain-splitting phenomenon. However, the old Namenode can still accept read requests, which may cause data to become stale until the original Namenode attempts to write to journal node. It is therefore recommended to configure a suitable fencing me
when it wants a property value.In addition to AddResource, there are adddefaultresource methods, typically used when configuration is initialized, such as The configuration will load Core-default.xml and core-site.xml two resource as Defaultresource, And its subclass hdfsconfiguration will load Hdfs-default.xml and hdfs-site.xml as DefaultresourceDefaultresource is a static type, that is, all the configura
Official API link Address: http://hadoop.apache.org/docs/current/First, what is HDFs?HDFS (Hadoop Distributed File System): The universal Distributed File system above Hadoop, with high fault tolerance, high throughput features, and it is also at the heart of Hadoop.Ii. advantages and disadvantages of HadoopAdvantages:
Write more verbose, if you are eager to find the answer directly to see the bold part of the ....
(PS: What is written here is all the content in the official document of the 2.5.2, the problem I encountered when I did it)
When you execute a mapreduce job locally, you encounter the problem of No such file or directory, follow the steps in the official documentation:
1. Formatting Namenode
Bin/hdfs Namenode-format
2. Start the Namenode and Datanod
HDFs:The condition configuration is the same as above1. The client initiates a read request to Namenode (hereinafter referred to as NN)2. NN returns a partial or full block list of a file to the client, and for each BLOCK,NN returns the address of the backup node for that block3. The client selects the nearest DN to read the block, closes the connection to the current DN after reading the data from the block, and looks for the next best DN storage block4. If no files have been read until after
accessapplications that require low-latency access to data in the millisecond range are not suitable for HDFS. HDFs is optimized for high data throughput, which may be at the expense of latency. Currently, HBase is a better choice for low-latency accessa large number of small filesThe namenode node stores the file system's metadata, so the limit on the number of files is determined by the amount of memory
*@throwsurisyntaxexception*/ Public StaticFileSystem Getfilesystembyuser (String puser)throwsException, interruptedexception, urisyntaxexception{String Fileuri= "/home/test/test.txt" ; Configuration conf=NewConfiguration (); Conf.set ("Fs.defaultfs", "hdfs://192.168.1.109:8020"); FileSystem FileSystem= Filesystem.get (NewURI (Fileuri), Conf, puser); returnFileSystem; } }2. Main classThis class is primarily used for file read and write and
all member variables and methods for the class name), F3 view the definition of the class name.RPC is a remote procedure call (remotely Procedure call) that calls Java object running in other virtual machines remotely. RPC is a client/server pattern that includes the service-side code and client code when used, as well as the remote procedure object we invoke.The operation of HDFS is built on this basis. This paper analyzes the operation mechanism of
Label:First, Environment construction 1.Hadoop http://my.oschina.net/u/204498/blog/519789 2.sqoop2.x http://my.oschina.net/u/204498/blog/518941 3. mysql Second, import HDFs from MySQL 1. Create MySQL database, table, and test data Xxxxxxxx$mysql-uroot-p enterpassword: mysql>showdatabases;
+--------------------+ |database| +--------------------+ |information_schema| |mysql | |performance_schema| |test
| +-
write a file
Namenode depending on file size and file block configuration, see the information returned to the client for some of the datanode it manages
The client divides the file into blocks, which are written sequentially to each datanode according to the Datanode address information
(2) file read
Client initiates read file request to Namenode
Namenode returns information about the Datanode that stores the file
Client Read file
(3) Block replication
Reprint please indicate the source, http://blog.csdn.net/lastsweetop/article/details/9001467
All source code on GitHub, Https://github.com/lastsweetop/styhadoop read data using Hadoop URL read A simpler way to read HDFS data is to open a stream via Java.net.URL, but before that, it's Seturlstreamhandlerfactory method is set to Fsurlstreamhandlerfactory (the factory takes the parse
, soHDFs has a high degree of fault tolerance.3. High data throughput HDFs uses a "one-time write, multiple read" This simple data consistency model, in HDFS , once a file has been created, written, closed, generally do not need to modify, such a simple consistency model, to improve throughput.4. Streaming data access HDFS has a large scale of data processing,
org.apache.hadoop.fs.filesystem$ Cache.getinternal (filesystem.java:2467) at Org.apache.hadoop.fs.filesystem$cache.get (FileSystem.java:2449) at or G.apache.hadoop.fs.filesystem.get (filesystem.java:367) at Org.apachE.hadoop.fs.filesystem$1.run (filesystem.java:156) at Org.apache.hadoop.fs.filesystem$1.run (FileSystem.java:153) At Java.security.AccessController.doPrivileged (Native method) at Javax.security.auth.Subject.doAs (subject.java:422 ) at Org.apache.hadoop.security.UserGroupInformation
standard topology structures. The administrator needs to adapt the actual network topology as much as possible.
With these basic ideas, we can proceed. I have read the datanode code for a while before. We all know that datanode has a registration process with namenode at startup to establish a superior-subordinate relationship with namenode. It can also be considered as the Bay pier. Then follow this route to view the rack perception principle. DatanodeProtocol defines the registration method I
Reference book: "Hadoop Combat" the second edition of the 9th chapter: HDFs Detailed1. HDFs Basic operation@ The bug information that appears@[email protected] WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable@[email protected] WARN
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.