One. HIVE Streaming
In hive, when you need to implement a function that is not possible with a function in hive, you can use streaming to implement it. The principle can be understood as: using language other than the HQL statement, such as Python, Shell to implement these functions, while cooperating with the HQL statement, to achieve special functions.
Two. Instance
1. Format of the log file
the- Geneva- Geneva on: -: GenevaW3svc12001:D A8:7007:102::244Get/favicon.ico- the-2001:D A8:7007:336: ca:f74b:eede:a024 mozilla/5.0+ (windows+nt+6.1; +wow64) +applewebkit/537.1+ (Khtml,+like+gecko) +maxthon/4.1.2.4000+chrome/26.0.1410.43+safari/537.1 404 0 2 the- Geneva- Geneva on: -: GenevaW3svc12001:D A8:7007:102::244Get/index.asp- the-2001:D A8:7007:336: ca:f74b:eede:a024 mozilla/5.0+ (windows+nt+6.1; +wow64;+trident/7.0; +RV:11.0; +maxthon/4.1.2.4000)302 0 0 the- Geneva- Geneva on: -: GenevaW3svc12001:D A8:7007:102::244Get/skin6/index.asp- the-2001:D A8:7007:336: ca:f74b:eede:a024 mozilla/5.0+ (windows+nt+6.1; +wow64;+trident/7.0; +RV:11.0; +maxthon/4.1.2.4000) $ 0 0 the- Geneva- Geneva on: -: GenevaW3svc12001:D A8:7007:102::244Get/skin6/images/head_menu_jt2.gif- the-2001:D A8:7007:336: ca:f74b:eede:a024 mozilla/5.0+ (windows+nt+6.1; +wow64;+trident/7.0; +RV:11.0; +maxthon/4.1.2.4000) $ 0 0
2. Purpose of processing
After splitting the log file with a space, delete the '% ' in the IP of the 10th field and the number behind it
3.hive Script and Shell content
Hive script File
ADD File/home/hadoop_admin/program/bash/process_exmovielog_ipv6. SH ; From (from exmovielog SELECT TRANSFORM (*) 'sh process_exmovielog_ipv6.sh ' *,year (temp.log_date), MONTH (temp.log_date);
Process_exmovielog_ipv6. sh script content:
#!/bin/Bash#time: .-4- -#Desc: when DoHive SQL, process the IPv6Cat$1|awk-F" " '{
#获取% good position pos=index ($Ten,"%"); if(pos = =0) Print $1" "$2"\ t"$3"\ t"$4"\ t"$5"\ t"$6"\ t"$7"\ t"$Ten"\ t"$ One"\ t"$ A"\ t"$ -"\ t"$ -; Else{IP=SUBSTR ($Ten,1, pos-1); Print $1" "$2"\ t"$3"\ t"$4"\ t"$5"\ t"$6"\ t"$7"\ t"Ip"\ t"$ One"\ t"$ A"\ t"$ -"\ t"$ -; }}'
Hive streaming using shell scripts