With the popularity of online advertising, pay by the per click (pay each time) mode is gradually accepted. But the problem with this is that fraud clicking's prevention is forced on the eyebrow, as it will have a direct bearing on the long-term viability of this advertising model and its ability to become a real source of revenue for web owners.
The following describes how Google AdSense system from the point of view of the system to prevent the click of deception, I hope that other online advertising system to prevent false clicks can have a very good guiding role.
With the popularity of online advertising, pay by the per click (pay each time) mode is gradually accepted. But the problem with this is that fraud clicking's prevention is forced on the eyebrow, as it will have a direct bearing on the long-term viability of this advertising model and its ability to become a real source of revenue for web owners.
The following describes how Google AdSense system from the point of view of the system to prevent the click of deception, I hope that other online advertising system to prevent false clicks can have a very good guiding role:
1] Click rate = number of clicks/total browsing times.
Click-through is a critical way to judge whether there is a fraud clicks, and it can be imagined that more than 10% of the ads on a website will mean this.
#of click/# of Viewed
2] Click coverage/Independent IP, this distribution if there is a single IP (click/browse) = Click coverage exceeded 3 times times the range of system error will be suspected of cheating.
example, for example, the user from 129.119.200.1 browsed 16 pages, clicked on 4 ads, and the entire ad's click rate "from [1] to" is 5%, then calculated:
%5 X 16 =~ 1, Variance is sqrt (1) = 1, click coverage =4/1 = 4, according to the mathematical high distribution, this probability is less than one out of 10,000.
Ratio VS IP Distribution
3] CTR "Click Coverage"/ip/time
Based on the time series to analyze the click rate, if there is a significant peak in a certain time period, then this will assume that there are potential deceptive clicks.
Ratio VS Time
4] Web page load time and ad clicks Analysis, as well as every two clicks between the Times series analysis
[Page load time and ad clicks] should be a Poisson distribution possion distribution, and every two clicks between the difference should be a possion distribution, if this time in seconds, More than 25 seconds, the shape of the Gaussian distribution is basically present.
[Time of Loading-time of click] distribution VS possion
[Time difference of two clicks] distribution VS possion/gaussion
5] Analysis for proxy clicks
Change the IP to click can be said in the past is the most difficult to find a way to detect cheating, probably the people of the Alexa boost when most of the use of proxy false click Method, but here as long as the source of the reverse monitoring IP is a proxy function of the server can know.
Reverse Proxy Check
6] for http_agent analysis
http_agent/time series analysis, peak over 3 variance needs to be reviewed
7] for http_referral analysis
referral/time series analysis, peak over 3 variance needs to be reviewed
8] There is also a very useful amount in the overall effect:
Average/Independent IP for all users ' effective per thousand presentation costs
This will be more direct to find spam clicking running computer and to be blocked.
Overall Ratio VS IP
Even though I'm here to give you more of a way to prevent cheating, don't forget:
The wicked are always more than the righteous, and come violently.
Author: Lu Liang original source: http://www.wespoke.com/archives/000795.php