Hadoop Fragmented Notes

Last Update:2016-07-07 Source: Internet

Author: User

Tags node server hdfs dfs hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Find out if there is a pipeline query for this software: sudo apt-cache search SSH | grep ssh

If installed: sudo apt-get install xxxxx

After installing SSH to generate a file is executed: ssh-keygen-t rsa-p ""-F ~/.ssh/id_rsa

Finally, configure the Core-site.xml, Hdfs-site.xml, mapred-site.xml in the three files in the Soft/haoop/etc/hadoop directory

-----------------------------------------------------

View port: NETSTAT-LNPT netstat or netstat-plut. View all ports: Netstat-ano

--------------------------------------------------------------

Where to put the files with Hadoop fs-put xxxx/xxxx/xxxxx/xxx

Put files on cluster above: Hadoop--config/soft/hadoop/etc/hadoop_cluster fs-put/home/ubuntu/hell.txt/user/ubuntu/data/

Download the file on the cluster: Hadoop--config/soft/hadoop/etc/hadoop_cluster fs-get/user/ubuntu/data/hello.txt bb.txt

View the health of files: HDFs--config/soft/hadoop/etc/hadoop/etc/hadoop_cluster fsck/user/ubuntu/data/hello.txt

Remote replication via SCP: Scp-r/xxx/x

Format File system: HDFs--config/soft/hadoop/etc/hadoop_cluster Namenode-format

Touch is to build a text file

Log in to another virtual machine SSH s2 from one virtual machine, if it is SSH s2 ls ~ is to display a column of the appearance. If you perform

SSH s2 ls ~ | Xargs is to display a horizontal content

View cluster Status: Hadoop--config/soft/hadoop/etc/hadoop_cluster FS-LSR/

Putting the file on the cluster is the Hadoop--config/soft/hadoop/etc/hadoop_cluster fs-put xxxxx followed by the location of the add-on path.

View the process SSH S2 JPS. PS-AF is also the viewing process. The kill process is the port number behind the kill-9 process

Su root root user

--------------------------------------------------

HDFs concept: Namenode & Datanode

Namenode: Image file + edit log, stored on local disk, and data node information, without block information. Block information is rebuilt by Datanode when cluster is started

Datanode:work node, storage retrieve block periodically sends block list to Namenode

Under Usr/local/sbin switch to the SU root user to build the script, write the execution script you want

Modify BlockSize size, default is 128m

It's in [Hdfs-site.xml]

Dfs.blocksize = 8m Set block size is 8M

1. Test method: Put file > 8m, view block size by WebUI

---------------------------------------------------------

Hadoop: Reliable, scalable, distributed computing framework, open source software

Four modules: 1, common----hadoop-commom-xxx.jar

2. HDFs

3. MapReduce

4. Yarn

Hadoop is fully distributed:

1. HDFs--->namenode, Datanode, Secondarynode (auxiliary name node)

2. Yarn---->resourcemanager (Resource manager), NodeManager (node Manager)

---------------------------------------------------

Configure the static IP to enter the network inside etc to edit the sudo nano interfaces:

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see Interfaces (5).

# The Loopback network interface

Auto Lo

Iface Lo inet Loopback

# The Primary network interface

Auto Eth0

Iface eth0 inet DHCP

Iface eth0 inet Static (set to static IP)

Address 192.168.92.148 (client's IP)

netmask:255.255.255.0 (client)

Gateway 192.168.92.2 (NAT gateway address)

Dns-nameservers 192.168.92.2

Finally, restart the network card: sudo/etc/init.d/networking restart

-------------------------------------------------

Client shutdown Command:

1, sudo poweroff

2, sudo shutdown-h o

3, sudo halt

------------------------------

Configure text mode

Go inside the/boot/grub and check it out.

Then enter Cd/etc/default inside to execute gedit Grub

Write grub_cmdline_linux_default= "text" below #grub_cmdline_linux_default= "quiet"

Written under # Uncomment to disable graphical terminal (GRUB-PC only):

Grub_terminal=console//Open Comment

After change, execute sudo update-grub and finally perform a restart of sudo reboot

-----------------------------------------

To start all data nodes:

hadoop-daemons.sh start Namenode//Execute the Start name node on the name node server

hadoop-daemons.sh start Datanode//execute on specified Datanode, start all data nodes

hadoop-daemon.sh start Secondsrynamenode//Start the Secondary name node

-------------------------------------------------------

HDFs getconf can look up the node configuration information. For example, HDFs Getconf-namenode can know that it is running on a S1 client

-----------------------------------------------------------------

Four modules:

1, common

Hadoop-coommon-xxx.jar

Core-site.xml

Core-default.xml

2. HDFs

Hdfs-site.xml

Hdfs-defailt.xml

3. MapReduce

Mapre-site.xml

Mapred-default.xml

4. Yarn

Yarn-site.xml

Yarn-default.xml

----------------------------------

Common ports:

1, namenode RPC//8020 WebUI//50070

2, Datanode RPC//8032 WebUI//50075

3, 2nn WebUI//50090

4, HistoryserverWebUI//19888

5, Resourcmanager webui//8088

--------------------------------------

Dfs.hosts: decided to be able to connect Namenode

Dfs.hosts.exclude: Decided not to connect Namenode

Dfs.hosts Dfs.hosts.exclude

---------------------------------------------

00//Cannot connect

01//Cannot connect

10//can connect

11//can even retire

---------------------------------------------

Safe Mode

1, namenode start, merge image and edit into new image, and generate a new edit log

2, the entire intelligent Safe mode, the client can only read

3. Check if Nameode is in safe mode

HDFs Dfsadmin-safemode Get//view Safe Mode

HDFs Dfsadmin-safemode Enter//Enter Safe mode

HDFs Dfsadmin-safemode Leave//Leave Safe Mode

HDFs Dfsadmin-safemode Wait//await Safe mode

4. Manually Save namespaces: Dfsadmin-savenamespace

5. Manually save the image file: HDFs dfsadmin-fetchimage

6. Save metadata: (Save under Hadoop_home: hadoop/logs/) HDFs Dfsadmin-metasave Xxx.dsds

7, start-balancer.sh: Start the Equalizer, the purpose of the cluster data storage on the more average, improve the performance of the whole cluster (generally we start the equalizer in the case of increasing the node)

8. Hadoop Fs-count Statistics Directory

--------------------------------------------------

Hadoop Snapshot Snapshot: It is to save the current picture. Generic directory The default scenario is that snapshots cannot be created. HDFs Dfsadmin-allowsnapshot/user/ubuntu/data must be executed. Allow snapshots to be created followed by the address path where you want to create the snapshot. Once the snapshot is allowed to be created, we can execute the Hadoop fs-createsnapshot/user/ubuntu/data snap-1 to create the snapshot. Snap-1 is the name of the snapshot you created. View the snapshot of the word direct Hadoop fs-ls-r/user/ubuntu/data/.snapshot/. And you can't disable snapshots in the case of a snapshot creation.

1. Create a snapshot of Hadoop FS [-createsnapshot <snapshotDir> [<snapshotname>]

2. Delete Snapshot Hadoop fs [-deletesnapshot<snapshotdir> <oldName> <newname>]

3. Rename Snapshot Hadoop fs [-renamesnappshot<snapshotdir> <oldName> <newname>]

4. Allow directory snapshot Hadoop dfsadmin [-allowsnapshot <snapshotdir>]

5. Disable directory snapshots Hadoop dfsamdin[-disallowsnapshot<snapshotdir>]

------------------------------------------

Recycling Station

1, the default is 0 seconds, which means that the Recycle Bin is disabled

2, set the file Recycle Bin dwell time [corep-site.xml] fs.trash.interval=1//min count

3. Files deleted through shell command will enter trash

4, each user has his own Recycle Bin (directory) namely:/user/ubuntu/. Trash

5, the programmatic deletion does not enter the Recycle Bin, deletes immediately, can call. The Movetotrash () method, which returns false, indicates that the Recycle Bin is disabled or is already in the station

Recycle Bin: Hadoop default Recycle Bin is off, time unit: minutes corresponds to the current user folder. Trash directory. RM when the file is moved to this directory

[Core-site.xml]

<name>fs.trash.interval</name>

</property>

Recycle Bin: Recovers files. Move the files from the. Trash directory: Hadoop fs-mv/user/ubuntu/. trash/xx/x/x data/

Empty Recycle Bin: Hadoop fs-expunge

Test Delete Recycle Bin: Hadoop fs-rm-r/user/ubuntu/. Trash

-----------------------------------

Quota: Quota

1. Directory quotas: HDFs dfsadmin-setquota n/dir//n > 0, directory quotas. 1: Indicates an empty directory and cannot place any elements

2. Space quota: HDFs Dfsadmin-setspacequota

Hadoop fs = = =HDFs DFS//File System Operations Command

-clsspacequota//Clear space quotas

-clsquota//Clear Directory quotas

---------------------------------------------------

OIV can view the contents of the image file-i is the input file-O is the output file. XML is a processor

What to do: HDFs oiv-i fsimage_000000000000000054-o ~/a.xml-p xml

View edit_xxx Edit log file: HDFs oev-i xxx_edit-o xxx.xml-p xml

is the image file here in/hadoop/dfs/name/current?

cat:fsimage_0000000000000054

BG% is for software to run in the background

-----------------------------------------------------------

Refresh node: HDFs dfsadmin-refreshnodes

-----------------------------------------

Hadoop Fragmented Notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More