Hive official does not support data loading in JSON format, the default support CSV format file loading, how to implement JSON data format resolution without relying on external jar package, this blog focuses on this problem solution
First create the metadata table:
string ' \ t ' ' Com.hadoop.mapred.DeprecatedLzoTextInputFormat ' ' Org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat ' ' Hdfs://sps1:9090/data/accesslog '
To create a view chart:
CREATE View Access_log_view as select eventtime, IP, appName, FP, username, Target from Access_log lateral view json_tuple (content, " eventtime " , " ip ", " appname ", " Span style= "COLOR: #800000" >FP ", " username ", " target ") T1 as Eventtime, IP, appName, FP, username, target;
The view chart uses JSON tuple to extract the data from the JSON object, which enables field separation.
However, some log files are/user/aaa/dt=2013-12-01/ds=01/access.log with partitioned directories, which require the support of partitioned tables for this format
To create a partitioned table:
string int int ' \ t ' ' Com.hadoop.mapred.DeprecatedLzoTextInputFormat ' ' Org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat ' ' Hdfs://sps1:9090/data/accesslog4 ';
But the problem came and found no way to load the data, what to do about that.
Next we need to manually load the partition:
ALTER TABLE Access_log add partition (dt=?,ds=?)
This will allow you to find the data. Remember that you must partition add, otherwise you will not be able to find the data.
To create a view chart:
Same as CREATE view above
But partitioning is increasing over time and this cannot be human, we need automated scripting to help us complete
#!/bin/~/. BASHRCDate= 'date +%y-%m-%d ' hour= 'date +%H ' CMD="ALTER TABLE databasename.tablename ADD PARTITION (dt= ' $date ', ht= ' $hour '); " "$cmd"
So far, the problem with hive loading JSON data and partitioned tables has been explained, and we continue our discussion without understanding the message below.
Hive loading JSON data solution