Hadoop
Unzip the GZ file to a text file
$ Hadoop fs-text/hdfs_path/compressed_file.gz | Hadoop Fs-put-/tmp/uncompressed-file.txt
Unzip the local file Gz file and upload it to HDFs
$ gunzip-c filename.txt.gz | Hadoop Fs-put-/tmp/filename.txt
Using awk to process CSV file
Hadoop Streaming is a tool for Hadoop that allows users to write MapReduce programs in other languages, and users can perform map/reduce jobs simply by providing mapper and reducer
For information, see the official Hadoop streaming document.
1, the following to achieve wordcount as an example, using C + + to write map
Today, when using hive to query the maximum value of a certain analysis data, there is a certain problem, in hive, the phenomenon is as follows:caused by:java.io.filenotfoundexception://http://slave1:50060/tasklog?attemptid=attempt_201501050454_0006_m_00001_1Then take a look at the Jobtracker log:2015-01-05 21:43:23,724 INFO Org.apache.hadoop.mapred.jobinprogress:job_201501052137_0004:nmaps=1 NReduces=1 max=- 12015-01-05 21:43:23,724 INFO Org.apache.h
First you need to make sure that Hadoop is already installed on your computer. Then you just need to download the Hhase configuration.STEP1: Download hbase http://archive.apache.org/dist/hbase/1.2.6/Select hbase-1.2.6-bin.tar.gzSTEP2: Extract HBase to the specified directorySTEP3: Modify the configuration file (go to the Conf folder)Step 3.1:hbase-env.shExport java_home=/library/java/javavirtualmachines/jdk1.8.0_171.jdk/contents/homeexport HBASE_MANAG
Using process_monitor.sh to monitor the crontab configuration of the Hadoop process
You can find process_monitor.sh from the following links:
https://github.com/eyjian/mooon/blob/master/common_library/shell/process_monitor.sh
---------------------------------------------------------Script Content--------------------------------------------------------
#!/bin/sh
# https://github.com/eyjian/mooon/blob/ma
The use of MAVEN is no longer long-winded, there are many online, and so many years of change is not, here only describes how to build Hadoop development environment.
1. First create the project
MVN archetype:generate-dgroupid=my.hadoopstudy-dartifactid=hadoopstudy-darchetypeartifactid= Maven-archetype-quickstart-dinteractivemode=false
2. Then add Hadoop's dependency pack Hadoop-common,
In recent days to verify the next Lzo This compression mode, has the following feeling:
Recently Lzo use problem, found Java.library.path setup problem, many online write is in hadoop-env.sh file add Java_library_path this attribute (about another increase Hadoop_classpath is valid , it is true that the jar package under the Lib directory is not automatically loaded when this hadoop-0.20.205.0 version does
Using the Lzo compression algorithm in Hadoop can reduce the size of the data and the disk read and write time of the data, not only that, Lzo is based on block block, so he allows the data to be decomposed into chunk, parallel by Hadoop processing. This allows Lzo to become a very useful compression format on Hadoop.Lzo itself is not splitable, so when the data
Filter nodes inaccessible to Hadoop using Shell scripts
The hp1 cluster recently used, because the maintenance staff of the cluster is not powerful, the node will always drop one or two after a while. Today, we found that HDFS is in protection mode when Hadoop is restarted.
I decided to filter out all the inaccessible nodes in the slaves node, so I wrote a smal
The use of MAVEN is no longer long-winded, there are many online, and so many years of change is not, here only describes how to build Hadoop development environment.
1. First create the project
Copy Code code as follows:
MVN archetype:generate-dgroupid=my.hadoopstudy-dartifactid=hadoopstudy-darchetypeartifactid= Maven-archetype-quickstart-dinteractivemode=false
2. Then add Hadoop's dependency pack
When you start the daemon thread: Sbin/start-dfs. SHThe following error alert appears:WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicableWorkaround:Download the corresponding version below the URL (I'm using hadoop-2.5.2)Http://dl.bintray.co
connect a cluster with Eclipse view file information Tip 9000 port denied connection errorcannot connect to the Map/reduce location:hadoop1.0.3Call to ubuntu/192.168.1.111:9000 failed on connection exception:java.net.ConnectException: Deny connection1. Common Solution: Configuration is normal, is not connected. Later, the Hadoop location was reconfigured, the host from Map/reduce Master and DFS master changed from localhost to the IP address (192.168.
connect a cluster with Eclipse view file information hint 9000port error denying connection cannot connect to the Map/reduce location:hadoop1.0.3Call to ubuntu/192.168.1.111:9000 failed on connection exception:java.net.ConnectException: deny connection1. Common Solution: The configuration is very normal, is not connected. Once again, Hadoop location was configured to change the host in Map/reduce Master and DFS master from localhost to the IP address
Tags: clu use int scale methods his primary base popIf your primary objective is to query your data in Hadoop to browse, manipulate, and extract it into R, then you probably Want to use SQL. You can write the SQL code explicitly to interact with Hadoop, or you can write SQL code implicitly with dplyr . The package had dplyr a generalized backend for data sources this translates your R code into SQL. You can
Tags: mysql hive jdbc Hadoop sqoopThe installation configuration of Hadoop is not spoken here.The installation of Sqoop is also very simple. After you complete the installation of SQOOP, you can test if you can connect to MySQL (note: The MySQL Jar pack is to be placed under Sqoop_home/lib): SQOOP list-databases--connect jdbc:mysql://192.168.1.109:3306/--username Root--password 19891231 The result is as fol
Tags: des style blog http ar os using SP onThe installation configuration of Hadoop is not spoken here. The installation of Sqoop is also very simple. After you complete the installation of SQOOP, you can test if you can connect to MySQL (note: The MySQL Jar pack is to be placed under Sqoop_home/lib):Sqoop list-databases--connect jdbc:mysql://192.168.1.109:3306/--username root--password 19891231 results are
{Fsdatainputstream is = Mfilesystem.Open(New Path("/test/install.log.syslog")); Ioutils.copybytes(Is,system. out,1024x768); Is.Close(); }@Before Public void setUp() {//Get spring context, spring's dependency injection, is to inject the object into the beans, similar to the moudle in the Dagger2, specifically responsible for generating the objectMcontext =New Classpathxmlapplicationcontext("Beans.xml");//Get FileSystem object by Beans.xml fileMfilesystem = (FileSystem) mcontext.Getbean("File
First, the commonly compiled Hadoop library is in Lib, if you do not want to compile, you can use the lib/native inside the precompiled library, and then move the native library to the Lib folder.CP hadoop-2.6.0/lib/native/* hadoop-2.6.0/lib/Second, add the system variableExport Hadoop_common_lib_native_dir=/home/administrator/work/
:49) at Com.cmri.bcpdm.v2.filters.counttransform.CountTransform.run (counttransform.java:223)At first, I couldn't figure out what was going on. Later on the internet to find a half-day, only found in the Hadoop source package inside the example with Sort.java program, carefully compared the new and old two versions, feel the need to use the new API to change the old code. The API is placed in the org.apache.hadoop.mapred package, and the new API is pl
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.