It is well known that the ubiquitous "non-human flow" of the Internet ( NHT, non-human-traffic) interferes with ad delivery, while the open market is a trading market with the lowest level of media inventory sold through real-time bidding (RTB), which is spread across a huge inventory of medium and small media It is therefore very difficult to implement monitoring. The problem is caused by the fact that many of the user's computers are still running older versions of Windows systems that are infected with malicious software that will run in the background as long as the computer is turned on and the other is an Android phone. Without the user's knowledge, preinstalled or installed a variety of applications are everywhere, by a variety of rogue software to leave some back door is the norm, naturally for the black industry chain to contribute.
The cloud-linked media uses the following techniques to identify cheating.
1, through the cloud-linked ownIpLibrary and third-party collaboration to differentiate the antecedents, the following five data sources can now be identified:1) Data Center(i.e.IpThe address belongs to the Internet Data Center room). 2) Dedicated outlet(i.e.Ipaddresses are fixed for use by medium and large institutionsIpInternet access line). 3) Backbone Node(i.e.Ipaddress belongs to carrier router nodeIp). 4) General Broadband(i.e.Ipaddress is the general family or small and medium-sized institutions, corporate broadband, etc.). 5) Mobile Broadband(i.e.Ipaddress is mobile2g/3g/4gand other conditions). Most of the first three nodes are unreal traffic,RTBwhen participating in the bid, it is filtered out.
2, using the machine learning algorithm and data mining technology to produce the IP Fidelity scoring system, can identify the human normal form of data. The scoring system is as follows:0~49 : This section of the IP, and human normal form difference,RTB bidding will be lowered or abandoned bid;50~99: IP for This segment , Basically conforms to human normal access pattern, can be regarded as normal access source.
3. Set frequency control for ad serving by unique user, reduce the risk of automatic brush flow of simulator or program.
4, the use of data mining technology to build their own anti-cheat library, from the massive log to identify the fraudulent APP,IP, in the auction to do filtering. Mainly from the following aspects identified:
1) Ad click-through exception: refers to virtual or malicious clicks, that is, CLICK/PV too high proportions, or fluctuations are very large.
2) exception of visitor fingerprint information ( browser, operating system, etc. ) : For example, a large percentage of access is from the same version of the operating system or browser under the same conditions, or the information is provided with Robot/spider and other identification information.
3)IP distribution anomaly: Mining a certain number of IP through the log generated a lot of clicks or exposure.
4) Ad click no corresponding exposure request exception: If the ads monitor both exposure and clicks, ads should appear before the click of the ad exposure, and the vast majority should appear in the same period of exposure log.
5) Ad access time distribution anomaly / regularity: Some ip/mzid appear in the click / exposure log every minute, or the interval between successive clicks/ exposures is too regular.
6) AD Source exception: Click or exposure Referer can mark the click or exposure of the source page, if a large number of sources in a certain page, and not the ad is located on the Web page, there may be media in other large traffic places ( as BBS) to set up hidden pages to act as exposure and click.
A glimpse of cloud-linked anti-fraud technology