Simple hive concepts
Hive is a hadoop-based Data Warehouse processing tool. Currently, it only supports Simple SQL queries and modification operations similar to traditional relational databases. It can directly convert SQL statements into mapreduce programs, developers do not have to learn to write Mr programs, improving development efficiency.
Example: Based on the hive environment stored in MySQL, hive metadata (hive-related tables, various table field attributes, and other information) is stored in the MySQL database, by default, MySQL data is stored in HDFS/user/hive/warehouse/hive. DB
DDL statement
MySQL is used as the directory of the metadata storage database (hive) structure.
Create a table
Hive> Create Table Test (ID int, name string );
Introduce the concept of partition, because select in hive usually scans the entire table, which will waste a lot of time, so introduce the concept of Partition
Hive> Create Table Test2 (ID int, name string) partitioned by (DS string );
Browse tables
Hive> show tables;
Introducing regular expressions like
Hive> show tables '. * t'
View data structure
Hive> describe test; or DESC test;
Modify or delete a table
Hive> alter table test Rename to test3;
Hive> alter table add columns (new_column type comment 'annotation ')
Hive> drop table test; DML operation statement
1. Import Data
LOAD DATA LOCAL INPATH ‘/home/hadoop/test.txt‘ OVERWRITE INTO TABLE test;
Local indicates local execution. If the file on HDFS is removed by default, overwrite indicates that the imported data is overwritten. If the file is removed, append is used.
2. Execute the query
Select * From Test2 where test2.ds = '2017-08-26'
3. It is worth noting that the select count (*) from test is different from the record query operations in our relational database.
Hive> select count (*) from Test2;
Total mapreduce jobs = 1
Launching job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes ):
Set hive.exe C. reducers. bytes. Per. Cer CER = <number>
In order to limit the maximum number of specified CERs:
Set hive.exe C. Fetch CERs. max = <number>
In order to set a constant number of specified CERs:
Set mapred. Reduce. Tasks = <number>
Starting job = job_1411720827309_0004, tracking url = http: // master: 8031/Proxy/application_1411720827309_0004/
Kill command =/usr/local/cloud/hadoop/bin/hadoop job-kill job_1411720827309_0004
Hadoop job information for stage-1: Number of mappers: 1; number of concurrent CERs: 1
Stage-1 Map = 0%, reduce = 0%
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 0%, cumulative CPU 0.93 Sec
Stage-1 Map = 100%, reduce = 100%, cumulative CPU 2.3 Sec
Stage-1 Map = 100%, reduce = 100%, cumulative CPU 2.3 Sec
Mapreduce total cumulative CPU time: 2 secondds 300 msec
Ended job = job_1411720827309_0004
Mapreduce Jobs launched:
Job 0: Map: 1 reduce: 1 Cumulative CPU: 2.3 sec HDFS read: 245 HDFS write: 2 Success
Total mapreduce CPU time spent: 2 secondds 300 msec
OK
3
Time taken: 27.508 seconds, fetched: 1 row (s)
Basic hive execution statement