tableIn Hivecreate table t_emp(id int,name string,age int,dept_name string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ‘,‘;We build a text data file in LinuxEmp.txtImport dataLoading files into tablesHive does not does any transformation and loading data into tables. Load operations is currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.LOAD DATA [LOCAL] inpath ' filepath ' [OVERWRITE] into TABLE tablename
(pubdate= ' 2010-08-22 ');Load data local inpath '/root/data.am ' into table beauty partition (nation= "USA");Select Nation, AVG (size) from the Beauties group by Nation ORDER by AVG (size);Two. UDFCustom UDF to inherit the Org.apache.hadoop.hive.ql.exec.UDF class implementation evaluatepublic class Areaudf extends Udf{private static MapCustom Function Call Procedure:1. Add a jar package (executed in the Hive command line)
It's inevitable that Hadoop, Hive, HBase will inevitably be the only thing that has started learning big data recently.
Here is a record of your own understanding of these 3:
1, Hadoop: It is a distributed computing + Distributed File system, the former is actually MapReduce, the latter is HDFs. The latter can be operated independently, the former can be used
Today, when using hive to query the maximum value of a certain analysis data, there is a certain problem, in hive, the phenomenon is as follows:caused by:java.io.filenotfoundexception://http://slave1:50060/tasklog?attemptid=attempt_201501050454_0006_m_00001_1Then take a look at the Jobtracker log:2015-01-05 21:43:23,724 INFO Org.apache.hadoop.mapred.jobinprogress:job_201501052137_0004:nmaps=1 NReduces=1 max
Management of Hive (iii)
Management of Hive (iii) remote service start of Hive remote service
Port number 10000
Starting mode:hive --service hiveserver
(Note: When you log in to hive with JDBC or ODBC programs to manipulate data, you must choose the remote service startup mode or our program is not conn
In the previous article we implemented Java+spark+hive+maven implementation and exception handling, the test instance is packaged to run in the Linux environment, but when the Windows system runs directly, there will be Hive related exception output, This article will help you integrate the Hadoop+spark+hive developmen
Because a lot of data is on the Hadoop platform, when migrating data from the hadoop platform to the hive directory, the default delimiter of hive is \, In order to smooth migration, you must specify the data delimiter when creating a table. The syntax is as follows:
Create table test (uid string, name string) row for
In the example of importing other table data into a table, we created a new table score1 and inserted the data into the score1 with the SQL statement. This is just a list of the above steps.
Inserting data
Insert into table score1 partition (openingtime=201509values (1,' (2,'a');
--------------------------------------------------------------------
Here, the content of this chapter is complete.
Analog data File Download
Github Https://github.com/sinodzh/HadoopExample/t
Hadoop
Unzip the GZ file to a text file
$ Hadoop fs-text/hdfs_path/compressed_file.gz | Hadoop Fs-put-/tmp/uncompressed-file.txt
Unzip the local file Gz file and upload it to HDFs
$ gunzip-c filename.txt.gz | Hadoop Fs-put-/tmp/filename.txt
Using awk to process CSV files, refer to using awk and friends with
Hive Chinese garbled problemAs we all know, we are using the MySQL storage hive metadata, you can execute with Chinese comments in the table file, to solve the problem of Chinese garbled:To set the metabase to Latin1 and set the encoding of the data table stored in Chinese to the utf-8 format, that is, the table stored in hive is utf-8.Some of the following are n
Learn this article for reference:http://www.shareditor.com/blogshow/?blogId=96Machine learning, data mining and other large-size processing are inseparable from a variety of open-source distributed systems,Hadoop for distributed storage and Map-reduce computing,Spark is used for distributed machine learning,Hive is a distributed database,HBase is a distributed KV
Hadoop, HBase, Hive, zookeeper default port description
Component
Daemon
Port
Configuration
Description
Hdfs
DataNode
50010
Dfs.datanode.address
Datanode service port for data transfer
50075
Dfs.datanode.http.address
Port for HTTP Service
50475
Dfs.datanode.https.address
Ports for HTTPS servic
records.NoteThere's a sentence in this journal14/12/05 08:49:46 INFO MapReduce. Job:the URL to track the job:http://hadoop01:8088/proxy/application_1406097234796_0037/This means you can use the browser to access the address to see the implementation of the task, if your task for a long time the card master is not finished is wrong, you can go to this address to see the detailed error logView ResultsMysql> SELECT * from employee;+--------+----+-------+| Rowkey | ID | Name |+--------+----+------
get out of this safe mode
1. Modify dfs.safemode.threshold.pct to a relatively small value, the default is 0.999.
2. Hadoop dfsadmin-safemode leave command forced to leave
Http://bbs.hadoopor.com/viewthread.php?tid=61extra=page=1
The user can manipulate safe mode by Dfsadmin-safemode value, as described in parameter value:
Enter-Enter Safe mode
Leave-Force Namenode to leave Safe mode
Get-Returns information on whether Safe mode is open
Wait-waits un
[:nnnnn], blue part
The follower is used to connect to the leader and only listens on the leader on the port.
3888
/etc/zookeeper/conf/zoo.cfg in server.x=[hostname]:nnnnn[:nnnnn], blue part
Used for the leader election. Required only if ELECTIONALG is 3 (default).
All port protocols are based on TCP.For all Hadoop daemon that exist in the Web UI (HTTP service), there are URLs like:/logsList of log fil
Modified In the hadoop/etc/hadoop/core-site.xml FileAfter the attribute value is set, the original hive data cannot be found. You need to change the location attribute in the SDS table in the hive MetaStore database and change the corresponding HDFS parameter value to a new value.After modifying the
Testhivedrivertable1terry2alex3jimmy4mike5katerunning:select count (1) from TesthivedrivertableIn fact, the Java call is very simple, that is, you execute the statement in the hive shell with JDBC to do it again, so you transfer the past statement of the environment is the Hive server machine, which is written in the path from the hive server host root directory
Why does data analysis generally use java instead of hadoop, flume, and hive APIs to process related services? Why does data analysis generally use java instead of hadoop, flume, and hive APIs to process related services?
Reply content:
Why does data analysis generally use java instead of
(' 123456 ') where user= ' root ';//set root user password mysql> Select Host,user,password from User where user= ' root ';mysql> flush privileges;mysql> exitIf you are not able to connect remotely, turn off the firewall/etc/rc.d/init.d/iptables stopTo manually install a later version, refer to:Http://www.cnblogs.com/zhoulf/archive/2013/01/25/zhoulf.htmlHttp://www.cnblogs.com/xiongpq/p/3384681.htmlCentOS Installation HiveCd/usr/localTAR-ZXVF hive-0.1
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.