94th Lesson: sparkstreaming Implementation of online blacklist filtering in AD billing system

Source: Internet
Author: User

The course content of this issue:

    • Online blacklist filtering implementation analysis

    • Sparkstreaming Implement online blacklist filtering

Advertising billing system is an essential function point for e-commerce. To prevent malicious ad clicks (assuming that merchant A and B are at the same time advertising, A and B are competitors, if a uses click Bots to make a malicious click on B's ad, then B's advertising costs will be exhausted soon), the ad click must be blacklisted.

You can use the Leftouter join to correlate the target data with the blacklist data and filter out the hit blacklist data.

This paper mainly introduces the use of the transform function of Dstream

Sparkstreaming Code Implementation

package com.dt.spark.streamingimport org.apache.spark.sparkconfimport  Org.apache.spark.streaming. {seconds, streamingcontext}/** *  Spark Online blacklist filter  *  @author  DINGLQ using the Scala development cluster  *          background Description: In the ad click Billing system, we filter out the blacklist click Online, and thus protect the interests of advertisers, Only effective AD click Billing  *          or in the anti-brush scoring (or traffic) system, filter out invalid votes or ratings or traffic;  *          implementation technology: Use TRANSFORM API to direct RDD-based programming for join operations   */object onlineblacklistfilter {  def main (args: array[string])  {     /**     *  1th step: Create a Spark Configuration object sparkconf, set the runtime configuration information for the SPARK program,      *  For example, use Setmaster to set the URL of the master of the Spark Cluster to which the program is linked, if set      *   is local, which represents the spark program running locally, especially for beginners with very poor machine configuration conditions (e.g.      *  only 1G of memory)       */  &nBsp; val conf = new sparkconf (). Setappname ("Onlineblacklistfilter")  //setting app name     //  set Batch interval to 30 seconds     val ssc = new  streamingcontext (Conf, seconds ())     /**     *  Blacklist data preparation, in fact, the blacklist is generally dynamic, for example, in Redis or database, blacklist generation often has complex business      *  logic, the situation of the algorithm is different, but in the spark  streaming when processing the full information can be accessed every time      */    // true is a blacklist, If you need to shut down temporarily, you can set the value to False    val blacklist = array ("Hadoop",  true),  ("Mathou",  true))     //the array into rdd    val blacklistrdd = . Ssc.sparkContext.parallelize (blacklist)     val adsClickStream =  Ssc.sockettextstream ("Spark-master",  9999)     /**     *   The format of each piece of data that the ad clicked here is: TiMe, name     *  The result of the map operation here is the name, (time,name) format      */     val formattedadsclickstream = adsclickstream.map (item =>  ( Item.split (" ") (1),  item)     val validateAds =  Formattedadsclickstream.transform (userclickrdd => {      /**        *  Leftouterjoin All content of the RDD that retains the content of the left-hand user ad Click,        *  got the corresponding click in the Blacklist        */       val joinedblacklistrdd = userclickrdd.leftouterjoin (BlackListRDD)        /**       *  when filter is filtered, its INPUT element is a tuple: (Name, (time , name),  boolean)        *  where the first element is the name of a blacklist, The second element of the second element is whether the leftouterjoin is present at the value        *  If there is, the surface of the current ad click is the blacklist, need to filter out, otherwise it is effective click content;        */       joinedblacklistrdd.filter (joineditem =>         if  (Joineditem._2._2.getorelse (false))  {           true        } else {           false        }       )     })     validateads.print ()      ssc.start ()     ssc.awaittermination ()     ssc.stop ()   }}


Package the program and upload it to the spark cluster


On the Spark-master node, start the NC

[Email protected]:~# nc-lk 9999


Running the Onlineblacklistfilter program

[Email protected]:~#/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin/spark-submit--class Com.dt.spark.streaming.OnlineBlackListFilter--master spark://spark-master:7077./spark.jar


Input data on NC side

[Email protected]:~# nc-lk 9999134343 Hadoop343434 spark3432777 Java0983743 Hbase893434 Mathou


Sparkstreaming Running results:

16/05/01 09:42:30 INFO Scheduler. Dagscheduler:resultstage 8 (print at onlineblacklistfilter.scala:63) finished in 0.048 s16/05/01 09:42:30 INFO Scheduler. Dagscheduler:job 3 Finished:print at onlineblacklistfilter.scala:63, took 0.111805 S-------------------------------------------time:1462066950000 Ms-------------------------------------------3432777 Java343434 spark0983743 Hbase


As a result, Hadoop and Mathou have been filtered out of the blacklist settings.


On the basis of this program, more complex business logic rules can be added to meet the needs of the enterprise.


Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains


This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1769290

94th Lesson: sparkstreaming Implementation of online blacklist filtering in AD billing system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.