Interactive websites are often plagued by spam. I will share some of my personal experiences in spam processing:
To deal with spam, you must analyze the characteristics of spam.
1. Spam on the content usually carries an external link with a fixed domain name, QQ number, mobile phone number, or duplicate content;
2. The frequency of posting may be that a user or an IP address can post multiple posts within a short period of time;
3. In terms of methods, a higher level of spam may use a sender, may use different IP addresses, and attach the attacker to crack the verification code.Program.
With features, let's take a look at how to deal with spam.
1. The most common anti-spam method is the verification code.
Currently, most websites use verification codes for anti-spam. Verification codes can prevent some machines from posting messages and increase the cost of manual spam. There are good or bad verification codes. Good verification codes are not easy to crack, or the cracking rate is very low. Good verification codes are usually not easy to recognize by machines but easy to recognize, usually there will be distortion or overlapping characters; on the contrary, a poor verification code is easily recognized by machines and not easily recognized by people, such as a verification code that only has noise but has not been distorted.
For example, the Yahoo verification code in is a good verification code, which is fully distorted. The distortion angle of each character is different, and there is a adhesion between each character.
For example, the following is a bad verification code, which only produces noise. No other processing is performed for each character. The cracking program can easily obtain the correct result by analyzing the noise.
2. Analyze the numbers in the post content or Internet link addresses and rules to determine whether the post is spam. We can create an httpmodule to determine each post submission by the user. If the content submitted by the user contains an Internet address or number, record the user ID and IP address of the sender, if the user repeatedly submits an Internet address with the same number or domain name within a configurable period of time, the user may be suspected of posting spam, this suspicious user is also sent to the monitoring list. When the user has published more than 5 times (configurable times) containing suspected ad content, the user can be rejected and then post again, and the log is recorded.
3. for high-quality forums, You can restrict registration and apply for registration; or allow users to post directly after the normal content of new users exceeds n, the posts posted by new users are reviewed by default before they pass n approved posts.
4. Set forbidden words to block words that have been confirmed as advertisements.
5. purchase third-party components for analysis and processing of spam. This is said to be very expensive and effective.
If the above solutions are used, but you still haven't stopped spam, I have another killer. The pass rate of a user's posting is used to control spam. That is, the pass rate of a user's posting content determines whether the next post can be displayed, assuming that the user posts with a pass rate of more than 90% are displayed by default, the cost of one user posting an advertisement post is to first produce nine normal posts, in this case, the Spam can basically be contained.
The above is my personal experience. If you have a better method, please share it. I only wrote my experiences and did not write specific technical implementation. If any of you have a good technical implementation, please share it with me.