Writing and running spark applications in Java

Source: Internet
Author: User

We first propose such a simple requirement:
Now to analyze the access log information of a Web site, the number of users from different IP visits, so that the GEO information to obtain access to the country region distribution. Here I take an example of the logging line on my site, as follows:

1 121.205.198.92--[21/feb/2014:00:00:07 +0800] "get/archives/417.html http/1.1 11465" http://shiyanjun.cn/ Archives/417.html/"" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 "
2 121.205.198.92--[21/feb/2014:00:00:11 +0800] "post/wp-comments-post.php http/1.1" 302 "http://shiyanjun.cn/ Archives/417.html/"" mozilla/5.0 (Windows NT 5.1; rv:23.0) gecko/20100101 firefox/23.0 "
3 121.205.198.92-[21/feb/2014:00:00:12 +0800] "get/archives/417.html/http/1.1" http://shiyanjun.cn/archives/ 417.html/"" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 "
4 121.205.198.92--[21/feb/2014:00:00:12 +0800] "get/archives/417.html http/1.1 11465" http://shiyanjun.cn/ Archives/417.html "" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 "
5 121.205.241.229--[21/feb/2014:00:00:13 +0800] "get/archives/526.html http/1.1 12080" http://shiyanjun.cn/ Archives/526.html/"" mozilla/5.0 (Windows NT 5.1; rv:11.0) gecko/20100101 firefox/11.0 "
6 121.205.241.229--[21/feb/2014:00:00:15 +0800] "post/wp-comments-post.php http/1.1" 302 "http://shiyanjun.cn/ Archives/526.html/"" mozilla/5.0 (Windows NT 5.1; rv:23.0) gecko/20100101 firefox/23.0 "

Java implementation Spark Application (application)

The statistical analysis program we have implemented, there are several function points: from HDFS read log data file to extract the first field (IP address) of each row to count the number of occurrences per IP address a descending sort based on the number of occurrences of each IP address, call the GeoIP library to obtain the IP-owned country based on the IP address Print output, per line format: [Country code] IP address frequency

Below, look at our statistical analysis application code using the Java implementation, as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.