Some time ago, in the project, the leader asked for real-time view of IP access from various provinces, according to this demand, through Flume/logstack real-time capture Nginx log to production to Kafka, and then through spark real-time consumption analysis saved to redis/ MySQL, the last front end through the Echart map of Baidu Real-time display.
First, there must be a list of rules for IP attribution, either local or distributed on multiple machines (such as HDFs). the
IP Rule table section is as follows:
1.0.1.0|1.0.3.255|16777472|16778239| Asia | china | fujian | fuzhou | | Telecom |350100| china| cn|119.306239|26.075302 1.0.8.0|1.0.15.255|16779264|16781311| Asia | china | guangdong | guangzhou | | Telecom |440100| china| cn|113.280637|23.125178 1.0.32.0|1.0.63.255|16785408|16793599| Asia | china | guangdong | guangzhou | | Telecom |440100| china| cn|113.280637|23.125178 1.1.0.0|1.1.0.255|16842752|16843007| Asia | china | fujian | fuzhou | | Telecom |350100| china| cn|119.306239|26.075302 1.1.2.0|1.1.7.255|16843264|16844799| Asia | china | fujian | fuzhou | | Telecom |350100| china| cn|119.306239|26.075302 1.1.8.0|1.1.63.255|16844800|16859135| Asia | china | guangdong | guangzhou | | Telecom |440100| china| cn|113.280637|23.125178 1.2.0.0|1.2.1.255|16908288|16908799| Asia | china | fujian | fuzhou | | Telecom |350100| china| cn|119.306239|26.075302 1.2.2.0|1.2.2.255|16908800|16909055| Asia | china | beijing | beijing | haidian | North Dragon Middle NET |110108| china| cn|116.29812|39.95931 1.2.4.0|1.2.4.255|16909312|16909567| Asia | china | beijing | beijing | | China Internet Information Center |110100| china| cn|116.405285|39.904989 1.2.5.0|1.2.7.255|16909568|16910335| Asia | china | fujian | fuzhou | | Telecom |350100| china| cn|119.306239|26.075302 1.2.8.0|1.2.8.255|16910336|16910591| Asia | china | beijing | beijing | | China Internet Information Center |110100| china| cn|116.405285|39.904989 1.2.9.0|1.2.127.255|16910592|16941055| Asia | china | guangdong | guangzhou | | Telecom |440100| china| cn|113.280637|23.125178 1.3.0.0|1.3.255.255|16973824|17039359| Asia | china | guangdong | guangzhou | | Telecom |440100| china| cn|113.280637|23.125178 1.4.1.0|1.4.3.255|17039616|17040383| Asia | china | fujian | fuzhou | | Telecom |350100| china| cn|119.306239|26.075302
1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Local mode
Import java.sql. {Date, PreparedStatement, Connection, DriverManager} import Org.apache.spark.
{sparkcontext, sparkconf}/** * Compute IP from dependency * Created by Tianjun on 2017/2/13.
*/Object Iplocation {def ip2long (ip:string): Long = {val fragments = Ip.split ("[.]") var ipnum = 0L for (i <-0 until fragments.length) {ipnum=fragments (i). Tolong | ipnum << 8L} IP Num} def binarysearch (lines:array[(string,string,string)],ip:long): Int ={var low =0 var = lines.lengt H-1 while (Low<=high) {val middle = (low + high)/2 if ((Ip>=lines (middle). _1.tolong) && (ip<
; =lines (middle). _2.tolong) {return middle} if (Ip<lines (middle). _1.tolong) {high=middle-1
}else{low = Middle +1}}-1} val data2mysql = (iterator:iterator[(string,int)]) =>{ var conn:connection = null var ps:preparedstatement = null val sql = "INSERT into Location_info (LocatiOn,counts,access_date) VALUES (?,?,?) " try {conn = drivermanager.getconnection ("jdbc:mysql://localhost:3306/bigdata?useunicode=true&
Characterencoding=utf-8 "," root "," 123 ") iterator.foreach (line = = {PS = conn.preparestatement (SQL)
Ps.setstring (1, line._1) Ps.setint (2, Line._2) ps.setdate (3, New Date (System.currenttimemillis ())) Ps.executeupdate ()})} catch {case e:exception = E.printstacktrace ()} finally {if (
PS! = null) Ps.close () if (conn! = null) Conn.close ()}} def main (args:array[string]) { Windows escalation error only, on Linxu do not need System.setproperty ("Hadoop.home.dir", "c:\\tianjun\\winutil\\") Val conf = new Spa Rkconf (). Setmaster ("local"). Setappname ("Iplocation") val sc = new Sparkcontext (conf)//Load IP dependency rules (can be obtained from multiple data) VA
L Ipruelsrdd = sc.textfile ("C://ip.txt"). Map (line=>{val fields = Line.split ("\\|") Val start_num = Fields (2) Val end_num = Fields (3) Val province = Fields (6) (start_num,end_num,province)})//All IP mapping rules Val Iprulesarray = Ipruelsrdd.collect ()//broadcast rule val iprulesbroadcast = Sc.broadcast (iprulesarray)//Load handling
The data val Ipsrdd = Sc.textfile ("C://log"). Map (line=>{val fields = Line.split ("\\|") Fields (1)}) Val result = Ipsrdd.map (IP =>{val ipnum = Ip2long (IP) val index = BinarySearch (Iprul
Esbroadcast.value,ipnum) Val info = iprulesbroadcast.value (index)//(IP start num,ip end num, province) info}) Accumulate results for each province. Map (t = = (t._3,1)). Reducebykey (_+_) result.foreachpartition (data2mysql)//println (res
Ult.collect (). Tobuffer) Sc.stop ()}}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 8 5 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
As you can see, using Spark's operators for data analysis is very easy.
Spark docking Kafka, database, etc. can be seen on the Spark website, which is very easy.
Let's take a look at the results of writing to the database in this example:
+----+----------+--------+---------------------+
| id | location | counts | access_date |
+----+----------+--------+---------------------+
| 7 | Shaanxi | 1824 | 2017-02-13 00:00:00 |
| 8 | Hebei | 383 | 2017-02-13 00:00:00 |
| 9 | Yunnan | 126 | 2017-02-13 00:00:00 |
| 10 | Chongqing | 868 | 2017-02-13 00:00:00 |
| 11 | Beijing | 1535 | 2017-02-13 00:00:00 |
+----+----------+--------+---------------------+
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
In this test, only the Nginx log is intercepted 4,700 or so of the log, the file size of about 1.9M.