1.1 DescriptionCDH official installation Deployment documentation, and step detailsHttp://www.cloudera.com/content/cloudera/zh-CN/documentation/core/v5-3-x/topics/installation_installation.htmlHere is the first way Cloudera Manager installs automaticallyCloudera Manager 5 requirements and supported versions specific detailsHttp://www.cloudera.com/content/cloudera
Use Cloudera Manager to install Hadoop
Hadoop is composed of many different services (such as HDFS, Hive, HBase, Spark, and so on). These services also have some dependencies. If you directly download the original Apache package, it is troublesome to download multiple times and configure multiple times. As a result, some companies have customized Hadoop, such as Clouder
Cloudera Manager and CDH 5.14.0 Installation Process in CentOS 7
As we all know, the configuration of Apache Hadoop is cumbersome and fragmented. For this reason, Cloudera provides the Clouder Manager tool and encapsulates Apache Hadoop, flume, spark, hive, hbase and other big data products form CDH products with their own characteristics, and then use CM for installation. This facilitates cluster construct
interoperability for data processing tools such as pig, mapreduce, and hive.Certificate ------------------------------------------------------------------------------------------------------------------------------------------------Chukwa:Chukwa is a hadoop-based big cluster monitoring system contributed by Yahoo.Certificate ------------------------------------------------------------------------------------------------------------------------------------------------
Hadoop Foundation----Hadoop Combat (vi)-----HADOOP management Tools---Cloudera Manager---CDH introduction
We have already learned about CDH in the last article, we will install CDH5.8 for the following study. CDH5.8 is now a relatively new version of Hadoop with more than hadoop2.0, and it already contains a number of components and components in the Hadoop ecosystem that we need to learn next.
Environment
Cl
26 Preliminary use of clusterDesign ideas of HDFsL Design IdeasDivide and Conquer: Large files, large batches of files, distributed on a large number of servers, so as to facilitate the use of divide-and-conquer method of massive data analysis;L role in Big Data systems:For a variety of distributed computing framework (such as: Mapreduce,spark,tez, ... ) Provides data storage servicesL Key Concepts: File Cut, copy storage, meta data26.1 HDFs Use1. Vie
For partition consideration, do not use LVMRoot --> 20 GBSwap -- 2x system memory
Ram --> 4 GBMaster node:Raid 10, dual Ethernet cards, dual power supplies, etc.Slave node:1. Raid is not necessary
2. HDFS partition, not using LVM/Etc/fstab -- ext3 defaults, noatimeMount to/data/N, for n = 0, 1, 2... (one partition per disk)
Cloudera Repository:
Http://archive.cloudera.com/cdh5/
Http://archive-p
Label: cloudera
For partition consideration, do not use LVMRoot --> 40 GB
VaR --> 100 GB
Swap -- 2x system memory
Ram --> 8 GBMaster node:Raid 10, dual Ethernet cards, dual power supplies, etc.Slave node:1. Raid is not necessary
2. HDFS partition, not using LVM/Etc/fstab -- ext3 defaults, noatimeMount to/data/n/dfs/dn, for n = 0, 1, 2... (one partition per disk)
Cloud
yum clean all
sudo yum upgrade cloudera-*
5. Check the installation of the RPM package.
Rpm-qa | grep Cloudera
"If you are using an add-on package for embedded databases and plug-ins, you may also see a cloudera-manager-server-db-2 entry, depending on the software previously installed in the server host." If the Cloudera
http://www.aboutyun.com/thread-9189-1-1.html here to the hehe. 1. Related catalogue/var/log/cloudera-scm-installer: Install log directory./var/log/*: Related log files (related services and cm)./usr/share/cmf/: Program installation directory./usr/lib64/cmf/: Agent program code./var/lib/cloudera-scm-server-db/data: Embedded Database directory./usr/bin/postgres: Embedded Database program./etc/
the page where the results are checked"Cloudera recommended setting/proc/sys/vm/swappiness to 0 when checking host correctness." The current setting is 30. "Warning, make the following settings# vi/etc/sysctl.confvm.swappiness = 0# sysctl–pWhen checking host correctness, the "enabled" transparent large page appears, which can cause significant performance issues. "Warning, make the following settingsecho never >/sys/kernel/mm/transparent_hugepage/ena
optimization
Modify the/etc/security/limits.conf file to add the following:
* Hard Nofile 65535* Soft Nofile 655355, SSH key to get through
This is not necessary, Cloudera-manager for each node installed HDFs, Flume, hive and other applications do not rely on SSH transmission, the agent based on communication, transmission, installation files. If you want an SSH key to get through, see my other blog pos
Java Operation HDFS Development environment constructionWe have previously described how to build hdfs pseudo-distributed environment on Linux, and also introduced some common commands in HDFs. But how do you do it at the code level? This is what is going to be covered in this section:1. First use idea to create a MAVEN project:Maven defaults to a warehouse that
This document is suitable for all versions of Cloudera Manager 5 and is upgraded using Tarballs tarballs contains Cloudera Manager server and Cloudera manageragent.
Https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ag_ug_cm5_tarballs.html#cm_ag_ug_cm5_tarballs
In most scenarios, upgrading Cloudera Man
Use Cloudera QuickStart VM to quickly deploy Hadoop applications without Configuration
Directory:
Download the cloudera-vm image from the CDH website
Use VirtualBox to start a VM
Test and use
System Environment:
Oracle VM VirtualBox 64bit host.1. Download The cloudera-vm image from the CDH website
Select on the website http://www.cloudera.com/content/support
customers from all walks of life.For Chinese users in the initial phase of Hadoop technology, Cloudera also takes into account the way in which the technology is interfaced with traditional data management technologies to further reduce the technical threshold. Ling that this is an important trend in big data, but also a unique advantage of Cloudera. Make Hadoop easier,
Impala is a new query system developed by cloudera. It provides SQL semantics and can query Pb-level big data stored in hadoop HDFS and hbase. Although the existing hive system also provides SQL semantics, the underlying hive execution uses the mapreduce engine and is still a batch processing process, which is difficult to satisfy the query interaction. In contrast, Impala's biggest feature is its speed. Im
the hive's metadata store, and hive does not have a MySQL driver by default and copies one via the following command:cp /opt/cm-5.3.4/share/cmf/lib/mysql-connector-java-5.1.25-bin.jar /opt/cloudera/parcels/CDH-5.3.4-1.cdh5.3.4.p0.12/lib/hive/lib/You will not encounter any problems after you continue with the installation.After a long wait, the installation of the service is complete:After the installation is complete, you can go to the cluster interf
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.