also provides operational capabilities necessary to run Hadoop in an enterprise production environment in the Commercial Components section, which is not covered by the open source community, such as non-downtime rolling upgrades, asynchronous disaster preparedness, and so on. Hortonworks uses the 100% fully open source strategy, with the product name HDP (Hortonworks Data Platform). All software produc
When using hadoop for big data analysis and processing, you must first make sure that you configure, deploy, and manage clusters. This is neither easy nor fun, but is loved by developers. This article provides five tools to help you achieve this.
Apache ambari
Apache ambari is an open-source project for hadoop monitoring, management, and lifecycle management. It is also a project that selects management for
installation and deployment notes-HBase full distribution mode installation
Detailed tutorial on creating HBase environment for standalone Edition
Reference documentation (hortonworks will be short for hdp; cloudera is cdh ):
1. Create a system template. Because I found the centos6.5 template in openvz, we tried to keep it consistent with production in the test environment, so we should use CentOS6.3, note automatically according to the official docu
sudo apt-get install Openssh-server
2) Generate RSA key
Ssh-keygen-t Rsa-p '
Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3) change root, ssh, and file permissions
chmod o+w ~/
chmod 700-r ~/.ssh
chmod ~/.ssh/authorized_keys
chmod 755/root
4) Change Sshd_config
Vi/etc/ssh/sshd_config
Change the Permitrootlogin Prohibit-password in the/etc/ssh/sshd_config file to Permitrootlogin Yes (or remove the Permitrootlogin Prohibit-password can also be, for security can be backed up, comment out can be
Problem:
In the 1.2.0 of Hadoop, because a single disk failure causes the Datanode node to fail, most of the datanode nodes in the production environment have more than one disk, and we now need a way for datanode to fail the entire node with a failure to block the disk.
Solution and applicable scenario:
1, modify the Hadoop source code (in addition to the author's ability)
2, modify the value of the Dfs.data.dir in the Hdfs-site.xml, remove the mount point of the failed disk and restart (re
0. upgrade MySQL to 5.61, stop the service.Stop Ambari-server, all ambari-agentAmbari-server Stop Ambari-agent Stop2, Backup Ambari database.Mkdir-p/root/tmp/ambariupgrateMysqlhotcopy--user=ambari--password=gotop123 ambari/root/tm
Recently work needs, to see hdinsight part, here to take notes. Nature is the most authoritative official information, so the contents are moved from here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/Hadoop on HDInsightMake big data, all know Hadoop, then hdinsight and hadoop what relationship? Hdinsight is a m$ Azure-based software architecture, mainly for data analysis, management, and it uses HDP (Hortonworks
Recently work needs, to see hdinsight part, here to take notes. Nature is the most authoritative official information, so the contents are moved from here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/Hadoop on HDInsightMake big data, all know Hadoop, then hdinsight and hadoop what relationship? Hdinsight is a m$ Azure-based software architecture, mainly for data analysis, management, and it uses HDP (Hortonworks
Original address: http://zh.hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/In this tutorial we'll walk through some of the basic HDFS commands you'll need to manage files on HDFS. To the tutorial you'll need a working HDP cluster. The easiest to has a Hadoop cluster is to download the Hortonworks Sandbox.Let ' s get started.Step 1:let ' s create a directory in HDFS, upload a file and list.Let's look at the syntax first:Hadoop Fs-m
Configuration recommendations:
1.In MR1, The mapred. tasktracker. Map. Tasks. Maximum and mapred. tasktracker. Reduce. Tasks. Maximum properties dictated how many map and reduce slots each tasktracker had.
These properties no longer exist in yarn. instead, yarn uses yarn. nodemanager. resource. memory-MB and yarn. nodemanager. resource. CPU-vcores, which control the amount of memory and CPU on each node, both available to both maps and reduces
Essentially:Yarn has no tasktrackers, but just gen
HDP (Hortonworks Data Platform) is a 100% open source Hadoop release from Hortworks, with yarn as its architecture center, including pig, Hive, Phoniex, HBase, Storm, A number of components such as Spark, in the latest version 2.4, monitor UI implementations with Grafana integration.Installation process:
Cluster planning
Package Download: (HDP2.4 installation package is too large, recommended for offline installation )
HDP installation
Apache Ambari 1.4.4 Installation Guide
(Note: because the image is difficult to use in word, you can download it from the attachment)
Operating System:
CentOS6
Cluster machine list:
Hadoop.master.com (192.168.1.204)
Hadoop.slave1.com (192.168.1.205)
Hadoop. slave2.com (192.168.1.206)
Ambari server Installation node and user
Hadoop.master.com/root
Preparations
SSH Login-free
Run ssh-keygen-t rsa in hadoop.ma
Currently, there are three main versions of Hadoop that are not charged (all foreign vendors), respectively:Apache (the most original version, all distributions are improved based on this version), the Cloudera version (Cloudera ' s distribution including Apache Hadoop, abbreviated CDH), Hortonworks version ( Hortonworks Data Platform, referred to as "HDP")
Hortonworks
First, refer to the Offline installation tutorial:http://www.jianshu.com/p/debf0e6a3f3bIt says it's for the ubuntu1404 version, but 1604 can also be installed.After the thunderbolt downloaded and copied to the server, follow the tutorial walk, from the HTTP server, built local source, apt-get install Ambari-server.These are easy, but at the beginning of Ambari-server setup There are some things you need to
dfsadmin-reportAppearLive Datanodes (2):This information indicates that the cluster was established successfullyAfter successful startup, you can access the Web interface http://192.168.1.151:50070 View NameNode and Datanode information, and you can view the files in HDFS online.Start YARN to see how tasks work through the Web interface: Http://192.168.1.151:8088/cluster command to manipulate HDFsHadoop FSThis command lists all the help interfaces for the sub-commands of HDFs. Basically the syn
developers can't fiddle with NFS, they can easily integrate MapR's distribution with HBase, HDFS, and other Apache Hadoop components, as well as move data in and out of NFS shoshould they choose to tap a different Hadoop distribution.This last point is MAID. It means, according to MapR, that there is no greater risk for vendor lock-in with its Hadoop distribution than with any other. MapR's focus on performance, availability, and API compatibility over open source code also comes th
mentioned in the previous section, it is hard to get commercial support for a common Apache Hadoop project, while the provider provides commercial support for its own Hadoop distribution.Hadoop distribution ProviderCurrently, in addition to Apache Hadoop, the Hortonworks, Cloudera and MAPR Troika are almost on the same page in their release. However, other Hadoop distributions have also appeared during this period. such as EMC's pivotal HD, IBM's Inf
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.