HDFS is one of our common components in big data. HDFS is an indispensable framework in the hadoop ecosystem. Therefore, when we enter hadoop, we must have a certain understanding of it. First, we all know that HDFS is a Distributed File System in the
Hadoop fs-mkdir/tmp/input new folder on HDFs
Hadoop fs-put input1.txt/tmp/input The local file input1.txt to the/tmp/input directory in HDFs
Hadoop fs-get input1.txt/tmp/input/input1.txt to pull HDFs files to localHadoop fs-ls/
From:http://www.2cto.com/database/201303/198460.htmlHadoop HDFs Common CommandsHadoop common commands:Hadoop FSView all commands supported by Hadoop HDFsHadoop fs–lslisting directory and file informationHadoop FS–LSRLoop lists directories, subdirectories, and file informationHadoop fs–put Test.txt/user/sunlightcsCopy the test.txt of the local file system to the/user/sunlightcs directory of the
About HDFSThe Hadoop Distributed file system, referred to as HDFs, is a distributed filesystem. HDFs is highly fault-tolerant and can be deployed on low-cost hardware, and HDFS provides high-throughput access to application data, which is suitable for applications with large data sets. It has the following characterist
It took some time to read the source code of HDFS. Yes.However, there have been a lot of parsing hadoop source code on the Internet, so we call it "edge material", that is, some scattered experiences and ideas.
In short, HDFS is divided into three parts:Namenode maintains the distribution of data on datanode and is also responsible for some scheduling tasks;Data
HDFs system architecture Diagram level analysis
Hadoop Distributed File System (HDFS): Distributed File systems
* Distributed applications mainly from the schema: Master node Namenode (one) from the node: Datenode (multiple)
*HDFS Service Components: Namenode,datanode,secondarynamenode
*
Distributed File System HDFS-datanode Architecture
1. Overview
Datanode: provides storage services for real file data.
Block: the most basic storage unit [the concept of a Linux operating system]. For the file content, the length and size of a file is size. The file is divided and numbered according to the fixed size and order starting from the 0 offset of the file, each divided block is called a block.
Unlike the Linux operating system, a file small
Hadoop was inspired by Google, and was originally designed to address the high and slow cost of data processing in traditional databases.
Hadoop two core projects are HDFS(Hadoop Distributed File System) and MapReduce.
HDFs is used to store data, which is different from
PHP calls the shell to upload local files into Hadoop's HDFs
Originally used to upload thrift, but its low upload efficiency, another person heinous, had to choose other methods.
?
Environment:
PHP operating Environment for Nginx + PHP-FPM
?
Because Hadoop has permission control enabled, there is no permission to use PHP directly to invoke Shel for uploading. The PHP execution command appears to be n
action instance object for a specific file system, based on the configuration informationFS = Filesystem.get (New URI ("Hdfs://hadoopmaster:9000/"), conf, "Hadoop");}/*** Upload files to compare the underlying wording** @throws Exception*/@Testpublic void Upload () throws Exception {Configuration conf = new configuration ();Conf.set ("Fs.defaultfs", "hdfs://hado
Two years of hard study, one fell back to liberation!!!Big data start to learn really headache key is Linux you play not 6 alas uncomfortableHadoop configuration See blog http://dblab.xmu.edu.cn/blog/install-hadoop/authoritative StuffNext is to read and write files under HDFsTalk about the problems you're having.have been said to reject the link, always thought it was their own Linux no permissions ..... Later found that their
Exception descriptionIn the case of an unknown hostname when you format the Hadoop namenode-format command on HDFS, the exception information is as follows:Java code
[Shirdrn@localhost bin]$ Hadoop namenode-format
11/06/: + INFO namenode. Namenode:startup_msg:
/************************************************************
Startup_msg:starting NameNod
HDFS
HDFSIt is a distributed file system with high fault tolerance and is suitable for deployment on cheap machines. It has the following features:
1) suitable for storing very large files
2) suitable for stream data reading, that is, suitable for "write only once, read multiple times" data processing mode
3) suitable for deployment on cheap machines
However, HDFS is not suitable for the following scenarios
Spark program
Note: This is not the final solution, so you need to find out why
If the file is important, you need to fix it.View file status one by one and restoreTake this file as an example:/user/admin/data/cdn//20170508/ngaahcs-access.log.3k3.201705081700.1494234003128.gz
To perform a repair command:
HDFs Debug Recoverlease-path HDFs Debug Recoverlease-path/user/admin/data/cdn//20170508/ngaahcs-acces
When accessing HDFs through the C API of Hadoop, there are many problems with compiling and running, so here's a summary:
System: ubuntu11.04,hadoop-0.20.203.0
The sample code is provided in the official documentation to:
#include "hdfs.h"
int main (int argc, char **argv) {
Hdfsfs fs = Hdfsconnect ("default", 0);
Const char* Writepath = "/tmp/testfile
When testing Hadoop, The dfshealth. jsp Management page on NameNode found that the LastContact parameter often exceeded 3 during the running process of DataNode. LC (LastContact) indicates how many seconds the DataNode has not sent a heartbeat packet to the NameNode. However, by default, DataNode is sent once every 3 seconds. We all know that NameN
When testing Hadoop, useDfThe shealth. jsp Management page
This section is not much of a talk about what Hadoop is, or the basics of Hadoop because it has a lot of detailed information on the Web, and here's what to say about HDFs. Perhaps everyone knows that HDFs is the underlying Hadoop storage module dedicated to storing data, so
it also has a negative impact, when the edits content is large, the startup of namenode will become very slow.In this regard, secondnamenode provides the ability to aggregate fsimage and edits. First, copy the data in namenode, then perform merge aggregation, and return the aggregated results to namenode, in addition, the local backup is retained, which not only speeds up the startup of namenode, but also increases the redundancy of namenode data.Io operations
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.