Machine EnvironmentUbuntu 14.10 64-bit | | OpenJDK-7 | | Scala-2.10.4Fleet OverviewHadoop-2.6.0 | | HBase-1.0.0 | | Spark-1.2.0 | | Zookeeper-3.4.6 | | hue-3.8.1About Hue (from the network):UE is an open-source Apache Hadoop UI system that was first evolved by Cloudera desktop and contributed by Cloudera to the open source community, which is based on the Python web framework Django implementation. By using
gained in this practice: Know the Hadoop source file name, quickly find the file to write the program when directly looking at the Hadoop related source debug program, you can directly enter the source code to view and track the run
Recommendation Index: ★★★★
Recommended reason: Through the source code can help us to better understand Hadoop, can help us solve complex problems 3. Proper use of compression algorithms
The following table of information refers to the
Preface
The content of this article is from the Hadoop veterans (and also the chief architect of Cloudera) Doug cutting a share of how the company uses open source software to enhance the business value of the company. It shares a lot of content related to the company and open source, this article makes a brief summary and summary (first person narration). The original is pure English, interested students, click this link to read: How
There are many versions of Hadoop, and here I choose the CDH version. CDH is the Cloudera company in Apache original base processed things. The specific CHD is:http://archive-primary.cloudera.com/cdh5/cdh/5/The version information is as follows:Hadoop:hadoop 2.3.0-cdh5.1.0jdk:1.7.0_79maven:apache-maven-3.2.5 (3.3.1 and later must be above JDK1.7)protobuf:protobuf-2.5.0ant:1.7.11. Install MavenMaven can download it on the MAVEN website (http://maven.ap
because the node servers in the cluster are automatically assigned IPS through DHCP, the IP is not changed in principle, because a fixed IP address has been assigned to the MAC address at boot time, unless the MAC address is changed. Coincidentally, yesterday morning sweeping aunt to a master node server because of wiping the table and the network cable to rip off, and so I found that the node is not connected to the time, re-plug the network cable after the result of the IP changed. Think of a
follow Unix mode?
Yes, Hadoop also has a "conf" directory under UNIX use cases.
7. What directory is Hadoop installed in?
Cloudera and Apache use the same directory structure, and Hadoop is installed in cd/usr/lib/hadoop-0.20/.
8. the port number for Namenode, Job Tracker, and task tracker is.
Namenode,70;job Tracker,30;task tracker,60.
9. What is the core configuration of Hadoop?
The core configuration of Hadoop is done through two XML files: 1,hado
recommend that you install Cloudera Manager on a Hadoop cluster, which provides real-time statistics on CPU, hard disk, and network load. (Cloudera Manager is a component of Cloudera Standard Edition and Enterprise Edition. The Enterprise Edition also supports rolling upgrade.) After Cloudera Manager is installed, the
SQOOP is an open-source tool mainly used for data transmission between Hadoop and traditional databases. The following is an excerpt from the SQOOP user manual.
Sqoopis a tool designed to transfer data between Hadoop and relational databases. you can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
SQOOP is an ope
System: CentOS Release 6.6 (Final)Nexus:nexus-2.8.1-bundle.tar.gz,:https://sonatype-download.global.ssl.fastly.net/nexus/oss/nexus-2.8.1-bundle.tar.gzJava:java Version "1.7.0_80"Create directory and enter directory: Mkdir/usr/local/nexusExtract files: tar-zxvf nexus-2.8.1-bundle.tar.gz, after decompression will appear two directories: Nexus-2.8.1-01,sonatype-workEnter nexus-2.8.1-01 and start Nexus:bin/nexus start.Show startup information:Starting Nexus OSS ...Started Nexus OSS ...Add a Nexus 80
1> removing the UUID of the agent node# rm-rf/opt/cm-5.4.7/lib/cloudera-scm-agent/*2> emptying the master node cm databaseGo to the MySQL database of the master node, and then drop db cm;3> Removing Agent node Namenode and Datanode node information# rm-rf/opt/dfs/nn/*# rm-rf/opt/dfs/dn/*4> re-initializing the CM database on the primary node#/opt/cm-5.4.7/share/cmf/schema/scm_prepare_database.sh MySQL cm-hlocalhost-uroot-p123456--scm-host localhost SCM
The program that has been developed Hadoop2.2.0 with Maven before. Environment changed to CDH5.2 after the error, found that Maven relies on the library problem. have been using http://mvnrepository.com/to find Maven dependencies before. But such sites can only find generic maven dependencies, not including CDH dependencies. Fortunately Cloudera provides a CDH dependency:Http://www.cloudera.com/content/cloudera
1. Error Description:The reason for this error is that I have previously installed the CDH in Cloudera Manager, which adds all the services and, of course, hbase. And then reinstall, the following error occurs:Failed to become active master,org.apache.hadoop.hbase.tableexistsexception:hbase:namespace.According to the above error we can clearly know that, when starting HBase, because the previously installed HBase version of the data also exists, so th
[Author]: Kwu (and news Big Data)Basic CDH5.4 Spark1.4.1 SPARKR deployment, combining R with Spark, provides an efficient solution for data analysis, while HDFS in Hadoop provides distributed storage for data analysis. This article describes the steps for an integrated installation:1, the environment of the clustercdh5.4+spark1.4.1Configuring Environment variables#javaexport java_home=/usr/java/jdk1.7.0_67-clouderaexport java_bin= $JAVA _home/binexport classpath=.: $JAVA _home/ Lib/dt.jar: $JAVA
Come up this morning. The company found that Cloudera manager had an HDFS warning, such as:The solution is: 1, the first to solve the simple problem, check the warning set threshold of how much, so you can quickly locate the problem where, sure enough journalnode sync status hint first eliminate, 2, and then solve the sync status problem, first find the explanation of the prompt , visible on the official web. Then check the configuration parameters th
First of all, the environment, there are two clusters, a new one of the old, is going to put the new debugging good then turn the old off.NEW: Cloudera Express 5.6.0,cdh-5.6.0Old: Cloudera Express 5.0.5,cdh-5.0.5A problem was found during the new cluster setup, the following command was used to create an index to the Lzo file, the job could not be committed to the specified queue in the new cluster, and the
a separate//beginning or caption in the title.
1. Choose the best installation packageFor a more convenient and standardized deployment of the Hadoop cluster, we used the Cloudera integration package.Because Cloudera has done a lot of optimization on Hadoop-related systems, many bugs have been avoided due to different versions of the system.This is also recommended by many senior Hadoop administrators.htt
Label:Error Description: Since my Hadoop cluster is automatically installed with Cloudera Manager online, their installation path must follow the Cloudera rules, and only see the official documentation for Cloudera, see:/http Www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_jdbc_driver_insta
Ebook sparkadvanced data analytics, sparkanalytics
This book is a practical example of Spark for large-scale data analysis, written by data scientists at Cloudera, a big data company. The four authors first explained Spark based on the broad background of Data Science and big data analysis, then introduced basic knowledge about Data Processing Using Spark and Scala, and then discussed how to use Spark for machine learning, it also introduces several
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.