An Internet statistics zoning problem using SpagoBI and hive?
1 originally based on the date of the partition and then splicing SQL will be very complex, need to judge in where conditions whether cross-year, cross-month, cross-day and other issues.
2 Hive's Custom function cannot return a string condition after a where condition for data filtering, but can return a Boolean for data filtering.
For example: Custom Function Date_where (StartTime, EndTime), the return format according to the start date and end date is:
Year=2015 and month=08 and day > 1 and Day < 10 stitching into where date_where ("2015-08-01", "2015-08-10") is not supported.
Second, based on the above problem, the partition is established by the date (DT)
Step1 Creating a Database
1 Create tvlog_test database tvlog_test;
STEP2 Creating a data table
1 Creating TVLOG_TCL data tables
CREATE table if not exists TVLOG_TEST.TVLOG_TCL (
ID string,
UserID String,
Channelid String,
ChannelName string,
Region string,
Channelcode String,
IP String,
StartTime String,
Endtime String,
Fromchannel String,
Tochannel String,
Mac String,
DeviceID String,
Dnum string
)
Partitioned by (DT string)
stored as ORC;
2 Creating Epg_wiki_info data tables
CREATE table if not exists tvlog_test.epg_wiki_info (
ID string,
Name String,
StartTime String,
Endtime String,
Wikiscreenshots Array<string>
Wikicover map<string, String>
Wikititle String,
tags array<string>
Wikiid String,
Channelcode String,
ChannelName string,
Timestamp string
)
Partitioned by (DT string)
stored as ORC;
STEP3 enable hive dynamic partition insertion
Set hive.exec.dynamic.partition=true;
Set hive.exec.dynamic.partition.mode=nonstrict;
Set hive.exec.max.dynamic.partitions.pernode=1000;
STEP4 Inserting data (2015-09-01 ~ 2015-09-05)
1 inserting data into the TVLOG_TEST.TVLOG_TCL table
Insert Overwrite table TVLOG_TEST.TVLOG_TCL
Partition (DT)
Select ID, userid, Channelid, ChannelName, Region,
Channelcode, IP, StartTime, Endtime, Fromchannel,
Tochannel, Mac, DeviceID, dnum, DT
From TVLOG.TVLOG_TCL
Where year = + and month = 9 and (day between 1 and 5);
2 inserting data into the Tvlog_test.epg_wiki_info table
Insert Overwrite table Tvlog_test.epg_wiki_info
Partition (DT)
Select ID, name, StartTime, Endtime, Wikiscreenshots, Wikicover,
Wikititle, tags, wikiid, channelcode, channelname, timestamp, DT
From Tvlog.epg_wiki_info
where dt between ' 2015-09-01 ' and ' 2015-09-05 ';
Partitioning a table by date