Hadoop Fragmented Notes

Source: Internet
Author: User
Tags node server hdfs dfs hadoop fs

Find out if there is a pipeline query for this software: sudo apt-cache search SSH | grep ssh


If installed: sudo apt-get install xxxxx

After installing SSH to generate a file is executed: ssh-keygen-t rsa-p ""-F ~/.ssh/id_rsa

Finally, configure the Core-site.xml, Hdfs-site.xml, mapred-site.xml in the three files in the Soft/haoop/etc/hadoop directory


-----------------------------------------------------


View port: NETSTAT-LNPT netstat or netstat-plut. View all ports: Netstat-ano

--------------------------------------------------------------

Where to put the files with Hadoop fs-put xxxx/xxxx/xxxxx/xxx


Put files on cluster above: Hadoop--config/soft/hadoop/etc/hadoop_cluster fs-put/home/ubuntu/hell.txt/user/ubuntu/data/

Download the file on the cluster: Hadoop--config/soft/hadoop/etc/hadoop_cluster fs-get/user/ubuntu/data/hello.txt bb.txt


View the health of files: HDFs--config/soft/hadoop/etc/hadoop/etc/hadoop_cluster fsck/user/ubuntu/data/hello.txt



Remote replication via SCP: Scp-r/xxx/x


Format File system: HDFs--config/soft/hadoop/etc/hadoop_cluster Namenode-format


Touch is to build a text file


Log in to another virtual machine SSH s2 from one virtual machine, if it is SSH s2 ls ~ is to display a column of the appearance. If you perform

SSH s2 ls ~ | Xargs is to display a horizontal content


View cluster Status: Hadoop--config/soft/hadoop/etc/hadoop_cluster FS-LSR/

Putting the file on the cluster is the Hadoop--config/soft/hadoop/etc/hadoop_cluster fs-put xxxxx followed by the location of the add-on path.


View the process SSH S2 JPS. PS-AF is also the viewing process. The kill process is the port number behind the kill-9 process


Su root root user


--------------------------------------------------

HDFs concept: Namenode & Datanode

Namenode: Image file + edit log, stored on local disk, and data node information, without block information. Block information is rebuilt by Datanode when cluster is started

Datanode:work node, storage retrieve block periodically sends block list to Namenode


Under Usr/local/sbin switch to the SU root user to build the script, write the execution script you want


Modify BlockSize size, default is 128m

It's in [Hdfs-site.xml]

Dfs.blocksize = 8m Set block size is 8M

1. Test method: Put file > 8m, view block size by WebUI


---------------------------------------------------------


Hadoop: Reliable, scalable, distributed computing framework, open source software


Four modules: 1, common----hadoop-commom-xxx.jar

2. HDFs

3. MapReduce

4. Yarn


Hadoop is fully distributed:

1. HDFs--->namenode, Datanode, Secondarynode (auxiliary name node)

2. Yarn---->resourcemanager (Resource manager), NodeManager (node Manager)


---------------------------------------------------

Configure the static IP to enter the network inside etc to edit the sudo nano interfaces:


# This file describes the network interfaces available on your system

# and how to activate them. For more information, see Interfaces (5).


# The Loopback network interface

Auto Lo

Iface Lo inet Loopback


# The Primary network interface

Auto Eth0

Iface eth0 inet DHCP

Iface eth0 inet Static (set to static IP)

Address 192.168.92.148 (client's IP)

netmask:255.255.255.0 (client)

Gateway 192.168.92.2 (NAT gateway address)

Dns-nameservers 192.168.92.2


Finally, restart the network card: sudo/etc/init.d/networking restart


-------------------------------------------------

Client shutdown Command:

1, sudo poweroff

2, sudo shutdown-h o

3, sudo halt


------------------------------

Configure text mode

Go inside the/boot/grub and check it out.

Then enter Cd/etc/default inside to execute gedit Grub

Write grub_cmdline_linux_default= "text" below #grub_cmdline_linux_default= "quiet"


Written under # Uncomment to disable graphical terminal (GRUB-PC only):

Grub_terminal=console//Open Comment


After change, execute sudo update-grub and finally perform a restart of sudo reboot


-----------------------------------------

To start all data nodes:

hadoop-daemons.sh start Namenode//Execute the Start name node on the name node server

hadoop-daemons.sh start Datanode//execute on specified Datanode, start all data nodes

hadoop-daemon.sh start Secondsrynamenode//Start the Secondary name node


-------------------------------------------------------

HDFs getconf can look up the node configuration information. For example, HDFs Getconf-namenode can know that it is running on a S1 client



-----------------------------------------------------------------

Four modules:

1, common

Hadoop-coommon-xxx.jar

Core-site.xml

Core-default.xml

2. HDFs

Hdfs-site.xml

Hdfs-defailt.xml

3. MapReduce

Mapre-site.xml

Mapred-default.xml

4. Yarn

Yarn-site.xml

Yarn-default.xml


----------------------------------

Common ports:

1, namenode RPC//8020 WebUI//50070

2, Datanode RPC//8032 WebUI//50075

3, 2nn WebUI//50090

4, HistoryserverWebUI//19888

5, Resourcmanager webui//8088


--------------------------------------

Dfs.hosts: decided to be able to connect Namenode

Dfs.hosts.exclude: Decided not to connect Namenode


Dfs.hosts Dfs.hosts.exclude

---------------------------------------------

00//Cannot connect

01//Cannot connect

10//can connect

11//can even retire



---------------------------------------------

Safe Mode

1, namenode start, merge image and edit into new image, and generate a new edit log

2, the entire intelligent Safe mode, the client can only read

3. Check if Nameode is in safe mode

HDFs Dfsadmin-safemode Get//view Safe Mode

HDFs Dfsadmin-safemode Enter//Enter Safe mode

HDFs Dfsadmin-safemode Leave//Leave Safe Mode

HDFs Dfsadmin-safemode Wait//await Safe mode

4. Manually Save namespaces: Dfsadmin-savenamespace


5. Manually save the image file: HDFs dfsadmin-fetchimage


6. Save metadata: (Save under Hadoop_home: hadoop/logs/) HDFs Dfsadmin-metasave Xxx.dsds


7, start-balancer.sh: Start the Equalizer, the purpose of the cluster data storage on the more average, improve the performance of the whole cluster (generally we start the equalizer in the case of increasing the node)

8. Hadoop Fs-count Statistics Directory



--------------------------------------------------

Hadoop Snapshot Snapshot: It is to save the current picture. Generic directory The default scenario is that snapshots cannot be created. HDFs Dfsadmin-allowsnapshot/user/ubuntu/data must be executed. Allow snapshots to be created followed by the address path where you want to create the snapshot. Once the snapshot is allowed to be created, we can execute the Hadoop fs-createsnapshot/user/ubuntu/data snap-1 to create the snapshot. Snap-1 is the name of the snapshot you created. View the snapshot of the word direct Hadoop fs-ls-r/user/ubuntu/data/.snapshot/. And you can't disable snapshots in the case of a snapshot creation.



1. Create a snapshot of Hadoop FS [-createsnapshot <snapshotDir> [<snapshotname>]


2. Delete Snapshot Hadoop fs [-deletesnapshot<snapshotdir> <oldName> <newname>]


3. Rename Snapshot Hadoop fs [-renamesnappshot<snapshotdir> <oldName> <newname>]


4. Allow directory snapshot Hadoop dfsadmin [-allowsnapshot <snapshotdir>]


5. Disable directory snapshots Hadoop dfsamdin[-disallowsnapshot<snapshotdir>]



------------------------------------------

Recycling Station

1, the default is 0 seconds, which means that the Recycle Bin is disabled

2, set the file Recycle Bin dwell time [corep-site.xml] fs.trash.interval=1//min count

3. Files deleted through shell command will enter trash

4, each user has his own Recycle Bin (directory) namely:/user/ubuntu/. Trash

5, the programmatic deletion does not enter the Recycle Bin, deletes immediately, can call. The Movetotrash () method, which returns false, indicates that the Recycle Bin is disabled or is already in the station


Recycle Bin: Hadoop default Recycle Bin is off, time unit: minutes corresponds to the current user folder. Trash directory. RM when the file is moved to this directory

[Core-site.xml]

<porperty>

<name>fs.trash.interval</name>

<value>30</value>

</property>


Recycle Bin: Recovers files. Move the files from the. Trash directory: Hadoop fs-mv/user/ubuntu/. trash/xx/x/x data/


Empty Recycle Bin: Hadoop fs-expunge


Test Delete Recycle Bin: Hadoop fs-rm-r/user/ubuntu/. Trash


-----------------------------------

Quota: Quota


1. Directory quotas: HDFs dfsadmin-setquota n/dir//n > 0, directory quotas. 1: Indicates an empty directory and cannot place any elements


2. Space quota: HDFs Dfsadmin-setspacequota


Hadoop fs = = =HDFs DFS//File System Operations Command

-clsspacequota//Clear space quotas

-clsquota//Clear Directory quotas


---------------------------------------------------

OIV can view the contents of the image file-i is the input file-O is the output file. XML is a processor

What to do: HDFs oiv-i fsimage_000000000000000054-o ~/a.xml-p xml


View edit_xxx Edit log file: HDFs oev-i xxx_edit-o xxx.xml-p xml



is the image file here in/hadoop/dfs/name/current?

cat:fsimage_0000000000000054


BG% is for software to run in the background


-----------------------------------------------------------

Refresh node: HDFs dfsadmin-refreshnodes


-----------------------------------------













Hadoop Fragmented Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.