Two methods to import nginx logs to hive: 1. Create the CREATETABLEapachelog table in hive (ipaddressSTRING, identdSTRING, userSTRING
Two methods to import nginx logs to hive: 1. create table apachelog (ipaddress STRING, identd STRING, user STRING) in hive
Two methods to import nginx logs to hive
1. Create a table in hive
The log format after import is
203.208.60.91--05/May/2011: 01: 18: 47 + 0800 GET/robots.txt HTTP/1.1 404 1238 Mozilla/5.0
This method supports the parse_url (referer, "HOST") function in hive ")
Method 2 Import
Note: This method must be executed before using the query statement after creating a table.
Hive> add jar/home/hjl/hive/lib/hive_contrib.jar;
Or set hive/conf/hive-default.conf to add
Hive. aux. jars. path
File: // usr/local/hadoop/hive/lib/hive-contrib-0.7.0-cdh3u0.jar.
Save Configuration
203.208.60.91--[05/May/2011: 01: 18: 47 + 0800] "GET/robots.txt HTTP/1.1" 404 1238 "-" "Mozilla/5.0 (compatible; googlebot/2.1; + )"
The field type stringfrom deserializer in this method is tested and does not support obtaining domain names through parse_url (referer, "HOST ").
You can use select split (referer, "/") [2] from apilog to get the Domain Name
If the file data is plain text, you can use stored as textfile. If data needs to be compressed, use stored as sequence.
Import log command
Hive> load data local inpath '/home/log/map.gz' overwrite into table log;
Imported logs support formats such as .gz
Example
Number of Statistics rows
Select count (*) from nginxlog;
IP count statistics
Select count (DISTINCT ip) from nginxlog;
Ranking
Select t2.ip, t2.xx from (SELECT ip, COUNT (*) AS xx FROM nginxlog GROUP by ip) t2 sort by t2.xx desc
Hive> SELECT * from apachelog WHERE ipaddress = '192. 211.123.184 ';
Hive> SELECT ipaddress, COUNT (1) AS numrequest FROM apachelog group by ipaddress sort by numrequest desc limit 1;
Hive> set mapred. reduce. tasks = 2;
Hive> SELECT ipaddress, COUNT (1) AS numrequest FROM apachelog group by ipaddress sort by numrequest desc limit 1;
Hive> create table ipsummary (ipaddress STRING, numrequest INT );
Hive> insert overwrite table ipsummary SELECT ipaddress, COUNT (1) FROM apachelog group by ipaddress;
Hive> SELECT ipsummary. ipaddress, ipsummary. numrequest FROM (select max (numrequest) AS themax FROM ipsummary) ipsummarymax JOIN ipsummary ON ipsummarymax. themax = ipsummary. numrequest;
Export hive query results to csv (not tested)
Hive> set hive. io. output. fileformat = CSVTextFile;
Hive> insert overwrite local directory '/tmp/CSVrepos/'select * from S where ...;