The Real Web 2.0

Last Update:2017-02-28 Source: Internet

Author: User

Tags perl script

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In 1994, National science Foundation lifted a ban on commercial propaganda on the Internet. At the time, email and Usenet were the main exchange platforms, while simple publishing systems such as Gopher were trying to build a broader user base. The WEB has not yet appeared. That year, a law firm named Canter & Siegel published the first public-oriented commercial spam message on Usenet, employing a Perl programmer to generate ads for its green card lottery service, which then exploded into more than 6,000 news Group. They became famous, but they also attracted criticism. They then started offering similar spam messages to others, promoting the benefits of spam and writing "internet Marketing" books. Since then, few online forums have survived unwelcome commercial advertising. As the web has become an important way for people to talk online, the spam problem has grown in vengeance as Web 2.0 technology has opened the door for people to speak freely on the web.

Sometimes the harm of Web spam is irrelevant, but more often than not, it's very harmful. Spammers are not usually content to release only one or two messages, but rather send messages to the forum, until the spam message drowns out the original theme. Sometimes, spam messages contain pornography or anti-social messages, which can have a greater negative impact. Most search engines will reduce the value of pages with such messages or provide links to sites linked to such spam, which means that spam reduces search engine optimization. The ultimate consequence is that Web publishers tend to waste large amounts of resources on anti-spam and take time to process other tasks.

Web spam is available in several formats, including:

Spam articles and destructive articles on the wikis

Weblog spam information on the nature of comments

Spam posts in forums, problem trackers, and other discussion forums

Referral spam (spam sites pretend to introduce users to a target site that contains reference information)

False user entries on social networks

It's very difficult to process web spam, but if web developers ignore spam prevention, it's the developers themselves who are hurting. In this article and in the 2nd part of the follow-up, I'll show you tips, techniques, and services for dealing with various WEB spam messages.

Spam Spammers ' behavior

The normal network behavior of people is often irregular in terms of time and content. Spam is usually created by a program, such as Canter & Siegel, which uses a Perl script to post to more than 6,000 Usenet groups. As a result, there are times when you can use the mechanisms of these programs to deal with spam spammers. For example, if you need to register, you can use a variety of warning flags to mark an account for further inspection. For example, if the account is from a top-level domain, such as. Ru,. BR,. biz, then it is very close to the spam spammers. If an account is flagged, you may need to postpone some of the posts that are posted to the account until you have checked the content and then released it.

Flood control

Wiki, Weblog, and forum spammers usually send dozens of requests in seconds, thereby limiting the number of requests that a user or IP address can send over a certain interval. This technique is also known as flood control. Also, make sure that the request control in this way is across the entire site, not just one page.

These behavioral evaluation techniques can also be applied in the next section of this article to prevent abuses on the Web other than spam. For example, if you host a Webmail service, spammers may try to create accounts in large numbers and then use those accounts to send spam, for example, if you host an online auction site, someone might write a program to manipulate the auction process. In short, as long as there is a system for registering new users, some users will attempt to use autoenrollment to achieve ulterior motives. In general, these technologies can help WEB developers ensure that their services are used by legitimate users.

Workflow control

Many spam messages come from robotic programs, and they tend to have some unique characteristics in terms of using sites, so you can take advantage of these features. Figure 1 summarizes the typical human and machine workflows when submitting a message or comment to add it to a site.

Figure 1. Typical human and machine workflows

Robotic workflows generally go straight into POST requests, so a large number of spam messages can be banned by detecting typical robotic workflows.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More