Hadoop + Hbase cluster data migration
Data migration or backup is a possible issue for any company. The official website also provides several solutions for hbase data migration. We recommend using Hadoop distcp for migration. It is suitable for data migration between large data volumes or cross-version clusters.
Version
Hadoop2.7.1
Hbase0.98.12
A problem found d
Let the virtual machine get the IP address of the network?Answer: Right-click the virtual machineNetwork adapter->network Connection is set toBridged:connected directly to the physical network
When configuring SSH for free login, copy theWORKAROUND: Use the command: add-i optionSsh-copy-id-i id_rsa.pub Master2With option-I, when no value is passed or if the ~/.ssh/identity.pub file is inaccessible (not present), Ssh-copy-id displays the above erro
Original? Blog. csdn. netyang_bestarticledetails41280553 the following sections describe how to configure a Hadoop cluster. The configuration file's Hadoop configuration is done through two important configuration files under the conf directory: the default configuration for hadoop-default.xml read-only. Configuration
After the accumulation of the front, today finally realized the cluster environment to deploy Hadoop, and successfully ran the official example.
Work as follows:
Two machines:
Namenode: Internet Small, 3G memory, machine name: yp-x100e,ip:192.168.101.130.
Datanode: Virtual machine, Win7 download VMWare10 virtual UBUNTU14, virtual machine name: ph-v370,ip:192.168.101.110
Ensure that you can ping each ot
1 installation versionBuild Hadoop2.4.0 version, based on Ubuntu12.04 x86_64, jdk1.7.0_792 References:1) Reliable Installation documentation http://www.aboutyun.com/thread-7684-1-1.html2) Official Installation documentation Http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html#Installation3 Main ideas:The basic idea of a fully distributed
parameter fs. checkpoint. dir;
Copy the file in namesecondary to fs. checkpoint. dir;
./Hadoop namenode-importCheckpoint;
Start NameNode and add-importCheckpoint. (This sentence is plagiarized with hadoop-0.20.2/hadoop-0.20.2/docs/cn/hdfs_user_guide.html # Secondary + NameNode, look at the documentation, There are instructions)
3.
Hadoop version: 2.5.0
When you configure the Hadoop cluster, on master, when you start the./start-all.sh under Directory/usr/hadoop/sbin/, on the master host
[Hadoop@master sbin]$./start-all.shThis script is deprecated. Instead Use start-dfs.sh and start-yarn.shStarting
Setting up the Environment: jdk1.6,ssh Password-free communication
System: CentOS 6.3
Cluster configuration: Namenode and ResourceManager on a single server, three data nodes
Build User: YARN
Hadoop2.2 Download Address: http://www.apache.org/dyn/closer.cgi/hadoop/common/
Step One: Upload Hadoop 2.2 and unzip to/export/
Uploading and downloading files on HDFs is the basic operation of the cluster, in the guide to Hadoop, there are examples of code for uploading and downloading files, but there is no clear way to configure the Hadoop client, after lengthy searches and debugging, How to configure a method for using clustering, and to test the available programs that you can use to
Apache Ambari is a Web-based open-source project that monitors, manages, and manages Hadoop lifecycles. It is also a project that selects management for the Hortonworks data platform. Ambari supports the following management services:
Apache HBaseApache HCatalogApache Hadoop HDFSApache HiveApache Hadoop MapReduceApache OozieApache PigApache SqoopApache TempletonA
, New Path (Otherargs[0])); File inputFileoutputformat.setoutputpath (Job, New Path (Otherargs[1])); File output//if (!job.waitforcompletion (TRUE))//wait for the output to completeReturnfor (int i = 0; i Fileinputformat.addinputpath (Job, New Path (Otherargs[i]));}Fileoutputformat.setoutputpath (Job,New Path (Otherargs[otherargs.length-1]);System.exit (Job.waitforcompletion (True)? 0:1);}}
Note: In the code can also, with no annotated code can also. The code for the comment is used whe
configuration file are:
Run the ": WQ" command to save and exit.
Through the above configuration, we have completed the simplest pseudo-distributed configuration.
Next, format the hadoop namenode:
Enter "Y" to complete the formatting process:
Start hadoop!
Start hadoop as follows:
Use the JPS command that comes with Java to query all daemon processes:
Star
Assume that the cluster is already configured.On the development client Linux CentOS 6.5:A. The client CentOS has an access user with the same name as the cluster: Huser.B.vim/etc/hosts joins the Namenode and joins the native IP.-------------------------1. Install Hadoop cluster with the same version of JDK,
java_home=/path/to/java command in console2 storing the data to be retrieved in HDFs$ bin/hadoop fs-put Urldir UrldirNote: The first Urldir is a local folder, a URL data file is stored, one URL per lineThe second urldir is a storage path for HDFs3 Starting the Nutch commandExecute the following command under the Nutch_hone/runtime/deploy directory$ bin/nutch Crawl Urldir–dir crawl-depth 3–topn 10After the command executes successfully, the crawl dire
requires reboot,Do not want to restart the words in the Code add: System.setproperty ("Hadoop.home.dir", "d:\\soft\\linux\\hadoop-2.4.0");(3) Exception information 3:Exception in thread "main" Org.apache.hadoop.mapred.FileAlreadyExistsException:Output directory hdfs:// 192.168.1.200:9000/user/output already existsWorkaround: The output folder already exists, modify the export folder or between outputs deleted(4) exception information 4:[97;97;98;99;1
For maven projects, the default integration test is performed as a phase of the build cycle, which is convenient for general projects to perform integration testing, but for Hadoop (or HBase) projects are not suitable because their applications run in the Cluster Environment and the development environment may be windows rather than linux, these reasons make it inconvenient to use the mvn command in the loc
The cluster space has been a little tight recently and is always worried about space shortage and crashes. The recent resizing is not realistic. After communicating with cluster users, we found that the cluster stores a lot of useless historical data and can be deleted, in this way, you can use a crontab script to generate a
name Node,task tracker know Job tracker. So modify the Conf/core-site.xml on Hadoopdatanode1 and Hadoopdatanode2, respectively:
and Conf/mapred-site.xml:
Format name Node :Execute on Hadoopnamenode:
Hadoop Namenode-format
start Hadoop :First, execute the following command on Hadoopnamenode to start all name node,
The environment for this configuration is the Hadoop1.2.1 version, and Hadoop introduced the Hadoop2.0 version in 13, which was modified on the basis of the Hadoop1.0 release to improve the efficiency of Hadoop cluster task scheduling, resource allocation, and fault handling.Hadoop2.0 on the basis of Hadoop1.0, the first to make a change to HDFs, in Hadoop1.0, HD
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.