Official API link Address: http://hadoop.apache.org/docs/current/First, what is HDFs?HDFS (Hadoop Distributed File System): The universal Distributed File system above Hadoop, with high fault tolerance, high throughput features, and it is also at the heart of Hadoop.Ii. advantages and disadvantages of HadoopAdvantages:
The Filestatus class in Hadoop can be used to view the meta information of files or directories in HDFs, any file or directory can get the corresponding filestatus, and here is a simple demo of the relevant API for this class:
* */package COM.CHARLES.HADOOP.FS;
Import Java.net.URI;
Import Java.sql.Timestamp;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.FileStatus;
1. Start Hadoop. Then Netstat-nltp|grep 50070, if the process is not found, the port modification without configuring the Web interface is hdfs-site,xml with the following configurationIf you use the hostname: port number, go first to check the hostname under/etc/hosts IP, whether configured and your current IP is the same, and then restart Hadoop2. Now in the vi
HDFs block of data
Disk data block is the smallest unit of data read/write for disk, typically 512 bytes,
There are also data blocks in the HDFs, and the default is 64MB. So the large files on the HDFs are divided into many chunk. Files that are small (less than 64MB) on HDFs will not occupy the entire block of space
org.apache.hadoop.fs.filesystem$ Cache.getinternal (filesystem.java:2467) at Org.apache.hadoop.fs.filesystem$cache.get (FileSystem.java:2449) at or G.apache.hadoop.fs.filesystem.get (filesystem.java:367) at Org.apachE.hadoop.fs.filesystem$1.run (filesystem.java:156) at Org.apache.hadoop.fs.filesystem$1.run (FileSystem.java:153) At Java.security.AccessController.doPrivileged (Native method) at Javax.security.auth.Subject.doAs (subject.java:422 ) at Org.apache.hadoop.security.UserGroupInformation
$handler.run (Server.java:1754)At this point, you can see the directory that holds the synchronization files/hadop-cdh-data/jddfs/nn/journalhdfs1 not found, SSH remote connection to the node to see that there is no such directory. Here, basically can be fixed to the problem, there are 2 ways to solve: one is to initialize the directory through the relevant command (I think this method is the correct way to solve the problem), and the second is to directly copy the normal Journalnode files over.
Reference book: "Hadoop Combat" the second edition of the 9th chapter: HDFs Detailed1. HDFs Basic operation@ The bug information that appears@[email protected] WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable@[email protected] WARN
Reprint please indicate the source, http://blog.csdn.net/lastsweetop/article/details/9001467
All source code on GitHub, Https://github.com/lastsweetop/styhadoop read data using Hadoop URL read A simpler way to read HDFS data is to open a stream via Java.net.URL, but before that, it's Seturlstreamhandlerfactory method is set to Fsurlstreamhandlerfactory (the factory takes the parse
, soHDFs has a high degree of fault tolerance.3. High data throughput HDFs uses a "one-time write, multiple read" This simple data consistency model, in HDFS , once a file has been created, written, closed, generally do not need to modify, such a simple consistency model, to improve throughput.4. Streaming data access HDFS has a large scale of data processing,
To deploy the logical schema:
HDFS HA Deployment Physical architecture
Attention: Journalnode uses very few resources, even in the actual production environment, but also Journalnode and Datanode deployed on the same machine; in the production environment, it is recommended that the main standby namenode each individual machine. Yarn Deployment Schema:
Personal Experiment Environment deployment diagram:
Ubuntu12 32bit Apache
This article describes the configuration method for using the HDFs Java API.1, first solve the dependence, pomDependency> groupId>Org.apache.hadoopgroupId> Artifactid>Hadoop-clientArtifactid> version>2.7.2version> Scope>ProvidedScope> Dependency>2, configuration files, storage HDFs cluster configuration informati
Data integrity
IO operation process will inevitably occur data loss or dirty data, data transmission of the greater the probability of error. Checksum error is the most commonly used method is to calculate a checksum before transmission, after transmission calculation of a checksum, two checksum if not the same data exist errors, more commonly used error check code is CRC32.
HDFs Data integrity
The checksum is computed when the
What is HDFs?Hadoop Distributed File System (Hadoop distributed filesystem)is a file system that allows files to be shared across multiple hosts on a network,Allows multiple users on multiple machines to share files and storage space.Characteristics:1. Permeability. Let's actually access the file through the network action, from the program and the user's view,It
Configuring the Elasticsearch storage path to HDFs takes two steps, installs the plug-in Elasticsearch-hadoop, and runs in the command window in the case of networking: Plugin-install elasticsearch/ Elasticsearch-hadoop/1.2.0 can be.If there is no network decompression plug-in to plugins, the directory is/hadoop ....In
Hadoop version: 2.6.0This article is from the Official document translation, reproduced please respect the work of the translator, note the following links:Http://www.cnblogs.com/zhangningbo/p/4146296.htmlBackground
In HDFs, the data is usually read by Datanode. However, when a client reads a file to a Datanode request, Datanode reads the file from disk and sends the data to the client via a TCP socke
Access the files on HDFs and write them out to the output station/*** Access the files on HDFs and write them out to the output station *@paramargs*/ Public Static voidMain (string[] args) {Try { //converts the URL of the HDFS format to a system-recognizedUrl.seturlstreamhandlerfactory (Newfsurlstreamhandlerfactory ()); URL URL=NewURL ("
individuallyof course, you can also start each process separately instead of starting it together, and the specific command is " $HADOOP _home/bin/ "directory, as shown in: Execute command start-dfs.sh HDFs, hdfs storage not used MapReduce stop-dfs.sh Exe
HDFs file operation examples, including uploading files to HDFs, downloading files from HDFs, and deleting files on HDFs, refer to the use of
Copy Code code as follows:
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.fs.*;
Import Java.io.File;Import java.io.IOException;public class
-2.4.1.tar.gz-c/java/decompression hadoopls lib/native/See what files are in the extracted directory CD etc/hadoop/into the profile directory vim hadoop-env.sh Modify Profile environment variable (export java_home=/java/jdk/jdk1.7.0_65) *-site.xml*vim core-site.xml Modify configuration file (go to official website for parameter meaning) ./Hadoop fs-du-s/#查看
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.