94th Lesson: sparkstreaming Implementation of online blacklist filtering in AD billing system

Last Update:2016-05-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The course content of this issue:

Online blacklist filtering implementation analysis
Sparkstreaming Implement online blacklist filtering

Advertising billing system is an essential function point for e-commerce. To prevent malicious ad clicks (assuming that merchant A and B are at the same time advertising, A and B are competitors, if a uses click Bots to make a malicious click on B's ad, then B's advertising costs will be exhausted soon), the ad click must be blacklisted.

You can use the Leftouter join to correlate the target data with the blacklist data and filter out the hit blacklist data.

This paper mainly introduces the use of the transform function of Dstream

Sparkstreaming Code Implementation

package com.dt.spark.streamingimport org.apache.spark.sparkconfimport  Org.apache.spark.streaming. {seconds, streamingcontext}/** *  Spark Online blacklist filter  *  @author &NBSP;DINGLQ using the Scala development cluster  *          background Description: In the ad click Billing system, we filter out the blacklist click Online, and thus protect the interests of advertisers, Only effective AD click Billing  *          or in the anti-brush scoring (or traffic) system, filter out invalid votes or ratings or traffic;  *          implementation technology: Use TRANSFORM&NBSP;API to direct RDD-based programming for join operations   */object onlineblacklistfilter {  def main (args: array[string])  {     /**     *  1th step: Create a Spark Configuration object sparkconf, set the runtime configuration information for the SPARK program,      *  For example, use Setmaster to set the URL of the master of the Spark Cluster to which the program is linked, if set      *   is local, which represents the spark program running locally, especially for beginners with very poor machine configuration conditions (e.g.      *  only 1G of memory)       */  &nBsp; val conf = new sparkconf (). Setappname ("Onlineblacklistfilter")  //setting app name     //  set Batch interval to 30 seconds     val ssc = new  streamingcontext (Conf, seconds ())     /**     *  Blacklist data preparation, in fact, the blacklist is generally dynamic, for example, in Redis or database, blacklist generation often has complex business      *  logic, the situation of the algorithm is different, but in the spark  streaming when processing the full information can be accessed every time      */    // true is a blacklist, If you need to shut down temporarily, you can set the value to False    val blacklist = array ("Hadoop",  true),  ("Mathou",  true))     //the array into rdd    val blacklistrdd = . Ssc.sparkContext.parallelize (blacklist)     val adsClickStream =  Ssc.sockettextstream ("Spark-master",  9999)     /**     *   The format of each piece of data that the ad clicked here is: TiMe, name     *  The result of the map operation here is the name, (time,name) format      */     val formattedadsclickstream = adsclickstream.map (item =>  ( Item.split (" ") (1),  item)     val validateAds =  Formattedadsclickstream.transform (userclickrdd => {      /**        *  Leftouterjoin All content of the RDD that retains the content of the left-hand user ad Click,        *  got the corresponding click in the Blacklist        */       val joinedblacklistrdd = userclickrdd.leftouterjoin (BlackListRDD)        /**       *  when filter is filtered, its INPUT element is a tuple: (Name, (time , name),  boolean)        *  where the first element is the name of a blacklist, The second element of the second element is whether the leftouterjoin is present at the value        *  If there is, the surface of the current ad click is the blacklist, need to filter out, otherwise it is effective click content;        */       joinedblacklistrdd.filter (joineditem =>         if  (Joineditem._2._2.getorelse (false))  {           true        } else {           false        }       )     })     validateads.print ()      ssc.start ()     ssc.awaittermination ()     ssc.stop ()   }}

Package the program and upload it to the spark cluster

On the Spark-master node, start the NC

[Email protected]:~# nc-lk 9999

Running the Onlineblacklistfilter program

[Email protected]:~#/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin/spark-submit--class Com.dt.spark.streaming.OnlineBlackListFilter--master spark://spark-master:7077./spark.jar

Input data on NC side

[Email protected]:~# nc-lk 9999134343 Hadoop343434 spark3432777 Java0983743 Hbase893434 Mathou

Sparkstreaming Running results:

16/05/01 09:42:30 INFO Scheduler. Dagscheduler:resultstage 8 (print at onlineblacklistfilter.scala:63) finished in 0.048 s16/05/01 09:42:30 INFO Scheduler. Dagscheduler:job 3 Finished:print at onlineblacklistfilter.scala:63, took 0.111805 S-------------------------------------------time:1462066950000 Ms-------------------------------------------3432777 Java343434 spark0983743 Hbase

As a result, Hadoop and Mathou have been filtered out of the blacklist settings.

On the basis of this program, more complex business logic rules can be added to meet the needs of the enterprise.

Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains

This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1769290

94th Lesson: sparkstreaming Implementation of online blacklist filtering in AD billing system

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

94th Lesson: sparkstreaming Implementation of online blacklist filtering in AD billing system

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

94th Lesson: sparkstreaming Implementation of online blacklist filtering in AD billing system

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support