We first propose such a simple requirement:
Now to analyze the access log information of a Web site, the number of users from different IP visits, so that the GEO information to obtain access to the country region distribution. Here I take an example of the logging line on my site, as follows:
1 |
121.205.198.92--[21/feb/2014:00:00:07 +0800] "get/archives/417.html http/1.1 11465" http://shiyanjun.cn/ Archives/417.html/"" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 " |
2 |
121.205.198.92--[21/feb/2014:00:00:11 +0800] "post/wp-comments-post.php http/1.1" 302 "http://shiyanjun.cn/ Archives/417.html/"" mozilla/5.0 (Windows NT 5.1; rv:23.0) gecko/20100101 firefox/23.0 " |
3 |
121.205.198.92-[21/feb/2014:00:00:12 +0800] "get/archives/417.html/http/1.1" http://shiyanjun.cn/archives/ 417.html/"" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 " |
4 |
121.205.198.92--[21/feb/2014:00:00:12 +0800] "get/archives/417.html http/1.1 11465" http://shiyanjun.cn/ Archives/417.html "" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 " |
5 |
121.205.241.229--[21/feb/2014:00:00:13 +0800] "get/archives/526.html http/1.1 12080" http://shiyanjun.cn/ Archives/526.html/"" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 " |
6 |
121.205.241.229--[21/feb/2014:00:00:15 +0800] "post/wp-comments-post.php http/1.1" 302 "http://shiyanjun.cn/ Archives/526.html/"" mozilla/5.0 (Windows NT 5.1; rv:23.0) gecko/20100101 firefox/23.0 " |
Java implementation Spark Application (application)
The statistical analysis program we have implemented, there are several function points: from HDFS read log data file to extract the first field (IP address) of each row to count the number of occurrences per IP address a descending sort based on the number of occurrences of each IP address, call the GeoIP library to obtain the IP-owned country based on the IP address Print output, per line format: [Country code] IP address frequency
Below, look at our statistical analysis application code using the Java implementation, as follows: