The introduction of the most core distributed File System HDFs, MapReduce processing, data warehousing tools hive and the distributed database HBase in the Hadoop distributed computing platform basically covers all the technical cores of the Hadoop distributed platform.The architecture of HDFsThe entire Hadoop architecture is mainly through
of various data senders in the log system and collects data, while Flume provides simple processing of data and writes to various data recipients (customizable) capabilities. typical architecture for flume:flume data source and output mode:Flume provides 2 modes from console (console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), syslog (syslog log system, TCP and UDP support), EXEC (command execution) The ability to collect data on a data source
Distributed File System HDFS-datanode Architecture
1. Overview
Datanode: provides storage services for real file data.
Block: the most basic storage unit [the concept of a Linux operating system]. For the file content, the length and size of a file is size. The file is divided and numbered according to the fixed size and order starting from the 0 offset of the file, each divided block is called a block.
Un
as the topology name! We use local mode here, so do not input parameters, directly see whether the process is going through;
Storm-0.9.0.1/bin/storm jar Storm-start-demo-0.0.1-snapshot.jar Com.storm.topology.MyTopology
Copy CodeLet's look at the log, print it out, insert data into the database.Then we look at the database and insert it successfully!Our entire integration is complete here! But there is a problem here, I do not know whether they have found. Since we use storm for di
processingHere you just need to enter a parameter as the topology name! We use local mode here, so do not input parameters, directly see whether the process is going through;
Storm-0.9.0.1/bin/storm jar Storm-start-demo-0.0.1-snapshot.jar Com.storm.topology.MyTopology
Copy CodeLet's look at the log, print it out, insert data into the database.Then we look at the database and insert it successfully!Our entire integration is complete here! But there is a problem here, I do not know
persisted, and the file is stored in HDFs in Figure 2:Figure 2 Storage diagram of the file in HDFsHDFs involves interactions between Namenode, Datanode, and clients. Essentially, client-to-namenode communication is the actual I/O operation with Datanode by acquiring or modifying the file's metadata. As shown in 3, there are three important roles in HDFs: NameNode, Datanode, and client, where client is the
"); Fsdataoutputstream outputstream=hdfs.create (DFS); Outputstream.write (Buff,0,buff.length);}}Renaming an HDFs filethroughFilesystem.rename (Path Src,path DST)can be specified for theHdfsfile Rename, whereSrcand theDstare the full path to the file. DeleteHdfson the filethroughFilesystem.delete (Path F,boolean recursive)You can delete the specifiedHdfsfile, whereFfor the full path of the file to be deleted,Recursiveused to determine whether recursiv
Read more: Build a high-availability and auto-scaling KV storage system GoogleSpanner global Distributed Database Baidu is how to use hadoop's OpenstackSwift introduction Redhat1.75 billion US Dollars acquisition of Inktank (Ceph provider) the original Article address of cloud architecture and O M: HDFS architecture and design (PDF). Thank you for sharing it wit
Ruchunli's work notes , who says programmers can't have a literary fan?
HDFS architecture See:Http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.htmlor download the TAR package after extracting theHadoop-2.6.0/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfsdesign.htmlThe struct
is currently used by exec in our system for log capture.Flume data recipients, which can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and syslogtcp (TCP syslog log system), and so on. It is received by Kafka in our system.Flume Download and Documentation: Http://flume.apache.org/Flume installation:$tar zxvf apache-flume-1.4. 0-bin.tar.gzFlume Start command:$bin/flume-ng agent--conf conf--conf-file conf/flume-conf.properties--
I. HDFS INTRODUCTION1.1 BackgroundWith the increasing amount of data, in an operating system jurisdiction of the scope of storage, then allocated to more operating system management of the disk, but not easy to manage and maintain, there is an urgent need for a system to manage the files on multiple machines, this is the Distributed file Management system.The academic point is that a distributed file system is a system that allows files to be shared a
Distributed File System HDFS-namenode architecture namenode
Is the management node of the entire file system.
It maintains the file directory tree of the entire file system [to make retrieval faster, this directory tree is stored in memory],
The metadata of the file/directory and the data block list corresponding to each file.
Receives user operation requests.
Hadoop ensures the robustness of namenode and i
26 Preliminary use of clusterDesign ideas of HDFsL Design IdeasDivide and Conquer: Large files, large batches of files, distributed on a large number of servers, so as to facilitate the use of divide-and-conquer method of massive data analysis;L role in Big Data systems:For a variety of distributed computing framework (such as: Mapreduce,spark,tez, ... ) Provides data storage servicesL Key Concepts: File Cut, copy storage, meta data26.1 HDFs Use1. Vie
distributed database or es+ Distributed File System architecture).Introduction to 2.HDFS Common shell commandsHadoop has two very important shell commands: Hadoop and HDFs. For the management of the HDFs file system, the Hadoop and HDFs scripting features are very repetitiv
applications (low latency applications can consider hbase distributed database or es+ Distributed File System architecture).2.HDFS often uses shell commands to briefly introduceHadoop has two very important shell commands: Hadoop and HDFs. For the management of the HDFs file system, the Hadoop and
"), also add our standard Spark classpath, built using compute-classpath.sh.
Classpath= ' $FWDIR/bin/compute-classpath.sh '
Classdata-path= "$SPARK _qiutest_jar: $CLASSPATH"
# find Java Binary
If [-N "${java_home}"]; Then
Runner= "${java_home}/bin/java"
Else
If [' command-v Java ']; Then
Runner= "Java"
Else
echo "Java_home is not set" >2
Exit 1
Fi
Fi
If ["$SPARK _print_launch_command" = = "1"]; Then
Echo-n "Spark Command:"
echo "$RUNNER"-CP "$CLASSPATH" "$@"
echo "=============================
, but ha (high availability) cannot be provided ), namenode is still "spof: single point of failure ). If namenode is dead, all clients cannot access the files in the HDFS file system, read/write, or list the files. The new namenode must perform the following three tasks to provide services again: I) load the namespace image II) Redo the operations in the edit log (III) it takes 30 minutes to receive the file block information reports from all datanod
Hadoop distributed FileSystem (Hadoop Distributed File System, HDFS)A distributed File system is a file system that consents to file sharing on multiple hosts over a network. Allows multiple users on multiple machines to share files and storage space.HDFs is just one of them. applies to the case of one write, multiple queries. Concurrent write scenarios are not supported. Small files are not appropriate. 2.HDFS
Java Operation HDFS Development environment constructionWe have previously described how to build hdfs pseudo-distributed environment on Linux, and also introduced some common commands in HDFs. But how do you do it at the code level? This is what is going to be covered in this section:1. First use idea to create a MAVEN project:Maven defaults to a warehouse that
now let's take a closer look at the FileSystem class for Hadoop. This class is used to interact with Hadoop's file system. While we are mainly targeting HDFS here, we should let our code use only abstract class filesystem so that our code can interact with any Hadoop file system. When we write the test code, we can test it with the local file system, use HDFs when deploying, just configure it, no need to mo
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.