Anti-spam Technology analysis

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Overview

E-Mail is one of the most commonly used network applications, and has become an important channel for communication. But spam spam most people, and recent surveys show that 93% of respondents are dissatisfied with the amount of junk mail they receive. Some simple spam events also pose an influential security issue. Increasing spam now results in a loss of $9.4 billion trillion over 1 years (data from a news Chinabyte), and in some articles, spam may cost 600 to 1000 dollars per user in a company.

As the internet continues to grow, spam is not as much a nuisance as it used to be, and spam can be said to be overwhelming. Initially, spam was mostly unsolicited commercial e-mail, and now more pornography, political spam is increasing, even about 40% of total spam, and there is still a trend of continued growth. On the other hand, spam has become a new and fast way to spread computer viruses.

And 50% of the world's mail is spam, with only a handful of organizations taking responsibility. Many anti-spam measures have been put forward, but only very few have been implemented. Unfortunately, these solutions are also not completely blocking spam, but they also have an impact on normal mail traffic.

1.1, what is junk mail?

To some extent, the definition of spam can be that people who do not intend to receive emails are spam. Like what:
* Commercial advertising. Many companies advertise new products, new activities, etc. by e-mail.

* Political rhetoric. There are a number of emails from other countries or reactionary organizations that are being sent to sell and trade their so-called comments, just like the commercial advertising of rubbish.

* Worm mail. More and more viruses are spreading quickly through email, which is a fast and effective way to spread the virus.

* Malicious mail. Threatening, deceptive mail. such as phishing, this is a fake web e-mail, is a complete ruse to deceive users of personal information, accounts and even credit cards.

Ordinary personal e-mail how to become the target of spam, resulting in such a result for many reasons, such as the site, forums and other places registered email address, viruses and so on in a friend's mailbox to find your e-mail, to the mail provider of user enumeration, and so on. In general, the less exposure to e-mail addresses the less you receive spam, the less time you use to receive junk mail. Some helpless users chose to give up their own mailbox and replace the new e-mail.

1.2, security issues

Spam has a great impact on the Internet and on the vast majority of users, not only do people need to spend time dealing with spam, consuming system resources, and so on, but also bring a lot of security problems.

It's obvious that spam is taking up a lot of network resources. Some mail servers because of security is poor, as a spam forwarding station for the warning, IP and other incidents occur, a large number of consumption of network resources to make normal business operations become slow. With the development of anti-spam in the world, the blacklist sharing between organizations makes the innocent server be shielded more widely, which will undoubtedly cause serious problems to the normal users.

The combination of spam and hacker attacks, viruses, and so on, is also getting closer, for example, the Sobig worm is installed and can be used to support mail forwarding agents. With the evolution of spam, the use of malicious code or monitoring software to support spam has increased significantly. December 2003 31, a Brazilian hacker organization sent spam messages containing malicious JavaScript scripts to millions of users, and those who browsed through Hotmail had unknowingly leaked their accounts. Another example is the recent IE URL display problem, in front of the host name to add "%01" can hide the real host address, in a few weeks after the release of the spam message.

More and more deceptive virus mail, let many enterprises suffer from it, even if the adoption of a good network protection strategy, it is still difficult to avoid, more and more security incidents are due to mail, may be viruses, trojans or other malicious programs. Phishing's fake tricks are really hard for the average user to make the right judgments, but the damage is straightforward.

2. Anti-spam Technology

Existing and in the mentioned anti-spam methods try to reduce the spam problem and deal with security requirements. Through the correct identification of spam, mail virus or mail attack program, etc. will be reduced. These solutions take a number of security paths to try to stop spam.

Dr. Neal Krawetz a very good classification of anti-spam technologies in anti-spam Solutions and Security[ref 1]. The current anti-spam technologies can be grouped into 4 categories: Filters (filter), reverse lookup (Reverse lookup), Challenge (challenges), and cryptography (cryptography), which can reduce the spam problem, But they all have their limitations. This article will discuss these technologies and the implementation of some of the major technologies in the following sections.

2.1, Filtration

Filtering (filter) is a relatively simple but straightforward approach to spam technology. This technique is used primarily for receiving systems (MUA, such as Outlook Express or MTA, such as SendMail) to identify and process spam messages. From the application point of view, this technology is also the most widely used, such as the anti-spam plug-ins on many mail servers, anti-spam gateways, anti-spam features on the client, etc., are the adoption of filtering technology.

2.1.1, keyword filter

Keyword filtering techniques typically create simple or complex word lists that are associated with spam to identify and process spam messages. For example, some of the keywords appear in a large number of spam, such as some virus message headers, such as: Test. This approach is similar to the virus features used by anti-virus software. It can be said that this is a simple content filtering method to deal with spam, it is based on the need to create a large list of filtered keywords.

This technical flaw is obvious, the filtration ability and the key words have the obvious connection, the list of keywords may also cause the error newspaper to be bigger, certainly the system uses this technology to process the mail the time consumes the system resources to be more. And, the general avoidance of keyword technology such as the word-breaking, the group words can easily bypass filtering.

2.1.2, black and white list

Blacklist (black list) and whitelist (white list). Each is a known spammer or trusted sender IP address or e-mail address. Now a lot of organizations are doing *BL (Block list), which often send spam IP address (even IP address range) gathered together to make block list, such as Spamhaus SBL (Spamhaus block list), a BL, Can be shared on a large scale. Many ISPs are using some of the organization's BL to block the receipt of spam messages. The whitelist, in contrast to the blacklist, is fully accepted for those trusted email addresses or IP.

Many mail receivers now use a Black-and-white list to handle spam, including MUA and MTA, which is more widely used in the MTA, which can effectively reduce the burden on the server.

BL technology also has obvious flaws, because it cannot contain all the (that is, a large number of) IP addresses in the Block list, and spammers can easily make garbage through different IP addresses.

2.1.3 Hash Technology

Hash technology is the mail system by creating a hash to describe the content of the message, such as the content of the message, the sender and so on as parameters, and finally calculate the hash of this message to describe the message. If the hash is the same, then the message content, sender, and so on are the same. This is used on some ISPs, and if duplicate hash values occur, then it can be suspected that a large number of messages are sent.

2.1.4 filtering based on rules

This filter forms rules based on certain characteristics (such as words, phrases, positions, sizes, attachments, etc.), which describe spam in the same way that IDs describes an intrusion event. To make the filter effective, it means that the manager maintains a large rule base.

2.1.5 Intelligent and probabilistic systems

Widely used is the Bayesian (Bayesian) algorithm, you can learn the frequency and mode of words, which can be associated with spam and normal mail to judge. This is a more complex and more intelligent content filtering technology than the keyword. I'll describe the most widely used technology in the client and server below.

2.1.5.1 Bayesian Bayesian algorithm

In a filter, the best thing to do now is a score filter, because it's easy to understand how easy it is to deal with dodgy spam, black and white lists, keyword libraries, or hash filters. Scoring system filter is one of the most basic algorithm filters and the basic prototype of Bayesian algorithm. The idea is to check the word or character in the spam message, each feature element (the simplest element is the word, the element of the complex point is the phrase) gives a score (positive fraction), and the other is checking the characteristic elements of the normal message to reduce the score (negative score). The end of the whole message to get a spam total score, through this score to determine whether spam.

This rating filter tries to achieve the automatic recognition of spam, but there are still some problems that are not appropriate:

* The list of feature elements is obtained through spam or normal mail. Therefore, to improve the effectiveness of identifying spam, you need to learn from hundreds of of emails, which reduces the efficiency of the filter, because for different people, the characteristic elements of the normal message are not the same.

* Getting the number of messages that feature element analysis is a key. If spammers also adapt to these features, they may make spam more like normal mail. In this way, the filtering features will change.

* The score for each word should be based on a good evaluation, but it is still arbitrary. For example, features may not be adapted to the word change of spam, nor to the needs of a particular user.

Bayesian theory is now widely used in the computer industry, this is a description of the uncertainty of things, such as Google in the calculation of the use of Bayesian theory. Bayesian filter is to calculate the probability of spam in the content of the message, it should first from a lot of spam and normal mail to learn, so the effect will be better than the ordinary content filter, the error will be less. The Bayesian filter is also a grading based filter. But not just a simple calculation of scores, but more fundamentally to identify. It uses automatic feature table, the principle of the first analysis of a large number of spam and a large number of normal mail, the algorithm analysis of a variety of features in the message probability.

The source of the Bayesian algorithm's computational features is usually:

• Words in the body of the message

• Message headers (sender, delivery path, etc.)

• Other performance, such as HTML coding (such as color)

• Phrases, phrases

meta information, such as where special phrases appear

For example, the normal message often appears in the word AAA, but basically does not appear in the spam, then the probability of the AAA marking spam is close to 0, and vice versa.

The steps of the Bayesian algorithm are:

1. Collect a large amount of spam and non-spam, set up spam sets and non-spam sets.

2. Extracting the Independent strings from the feature source, such as AAA, and the number of token strings that are extracted as token strings, is frequency. Follow these methods to handle all messages in the junk and non-spam sets separately.

3. Each message set corresponds to a hash table, hashtable_good corresponding to a non-spam set and hashtable_bad the corresponding junk e-mail set. Table to store the mapping relationship of token string to frequency.

4. Calculate the probability of the occurrence of the token string in each hash table (frequency of a token string)/(corresponding to the length of the Hashtable) p=

5. Consider Hashtable_good and Hashtable_bad, and infer the probability that the new message will be spam when a token string appears in the new message. The mathematical expression is:

A event----a spam message;

T1,t2 ... tn stands for TOKEN string.

P (a|ti) indicates the probability of a spam message when TOKEN string ti appears in the message. Set

The value of P1 (TI) =ti in Hashtable_good

The value of P2 (TI) =ti in Hashtable_ bad

Then P (a|ti) =p2 (TI)/[(P1 (TI) +p2 (TI));

6. Create a new hash table hashtable_probability Store token string ti to P (a|ti) mappings

7. Based on the established hash table hashtable_probability can estimate the likelihood of a new message being spam.

When a new message is sent, follow step 2 to generate the token string. The query hashtable_probability gets the key value of the token string. Assuming that the message has a total of n token string, the corresponding value in the t1,t2.......tn,hashtable_probability is P1, P2, ... PN, P (a|t1, T2, t3......tn) indicates the probability of a spam message when multiple token string t1,t2......tn occur at the same time in the message.

The composite probability formula can be obtained by:

P (a|t1, t2, t3......tn) = (p1*p2* ...) PN)/[p1*p2* ... pn+ (1-P1) * (1-P2) * ... (1-PN)]

When P (A|t1, T2, t3......tn) exceeds the predetermined threshold, it is possible to determine that the message is spam.

When a new message arrives, it is analyzed by a Bayesian filter, which calculates the probability that the message is spam by using each feature. Through continuous analysis, the filter is also constantly getting updates. For example, judging a message that contains the word AAA through a variety of features is spam, and the probability of the word AAA becoming a spam feature increases.

In this way, the Bayesian filter has the ability to adapt, both automatically, can also be manually operated by the user, but also more adaptable to the use of a single user. Spammers have a hard time adapting to this, so it's harder to avoid filtering filters, but they can, of course, disguise the message as a common normal message. Unless spammers can judge a person's filter, for example, by sending a receipt to understand which messages are opened by the user, they can adapt to the filter.

Although the Bayes filter also has a flaw in the scoring filter, it is more optimized. The practice also proved that the Bayesian filter in the client and server effect is very obvious, excellent Bayesian filter can identify more than 99.9% of spam. Most of the anti-spam products currently in use use this technology. For example, Bayesian filtration in Foxmail.

2.1.6 Limitations and shortcomings

Many of the current anti-spam products using filter technology often use a variety of filter technologies to make the product more efficient. Filters are graded by their false positives and false negatives. A false omission means that spam bypasses filter filtering. The false positives are the normal messages to be judged as junk mail. The perfect filter system should be non-existent and false positives, but this is the ideal situation.

Some anti-spam systems based on filter principles typically have the following three limitations:

• May be bypassed. Spammers and their sending tools are also not static, and they will quickly adapt to filters. For example, for a list of keywords, they can randomly change the spelling of some words ("tough", "弓虽", "strong-fierce"). Hash-buster (which produces a different hash in each message) is the way to bypass the hash filter. Currently commonly used Bayesian filters can be bypassed by inserting random words or sentences. Most filters are most effective in only a few weeks, and in order to maintain the usefulness of anti-spam systems, filter rules must be constantly updated, such as daily or weekly updates.

• False positives problem. The problem with the headache is to judge the normal message as junk mail. For example, a normal message that contains a word sample may be judged to be junk mail. Some of the normal servers unfortunately are included in a block list for irresponsible organizations to shield a segment, not because spam is sent (an example of a xfocus server). However, if you want to reduce the problem of false positives, it may cause serious false reporting problems.

• Filter review. Because of the problem of false positives, messages that are usually flagged as spam are not deleted immediately, but are placed in a spam bin for later inspection. Unfortunately, this also means that users still have to spend time looking at spam, even if it's only for message headers.

A more serious problem now is that the filter is still believed to be effective in blocking spam. In fact, spam filters do not effectively block spam, in most cases, spam still exists, still across the network, and is still being propagated. Unless users do not mind the existence of a false-positives message, they do not mind browsing the junk e-mail. Filters can help us organize and separate messages into spam and normal mail, but filter technology doesn't stop spam, it's actually just "processing" junk mail.

Despite the limitations of the filter technology, this is the most widely used anti-spam technology at the moment.

2.2, verify the query

SMTP does not take security into account when it is designed. In the 1973, computer security didn't make sense, and it was great to have an executable mail protocol at that time. For example, RFC524 describes some of the cases where SMTP is used as a standalone protocol:

"While people can or may be able to design software based on this document, please comment appropriately." Please make suggestions and questions. I am convinced that there are still problems in the agreement and I hope that readers can point them out when they read the RfC. "

Although the SMTP command group has evolved for a long time, people still perform SMTP on a RFC524 basis and assume that problems (such as security issues) will be resolved later. So until 2004, the errors originating from RFC524 still exist, and SMTP has become very broad and difficult to replace easily. Spam is an example of a misuse of the SMTP protocol, where most spam tools can forge headers, falsify senders, or hide sources.

Spam is typically a fake sender's address, and a handful of spam messages are used in real-world addresses. There are several reasons why spammers can forge messages:

* because it's illegal. In many countries, sending spam is illegal, and by falsifying the address, the sender may avoid being sued.

* for not being popular. Spammers understand that spam is unwelcome. By forging the sender's address, this reaction may be reduced.

* Subject to ISP restrictions. Most ISPs have a service clause that protects against spam, and by forging the sender's address, they can reduce the likelihood that ISPs will be barred from network access.

Therefore, if we can use a similar black and white list, can be more intelligent to identify what is forged mail, which is legitimate mail, then to a large extent to solve the problem of spam, verify that the query technology is based on such a starting point of origin. The following also resolves some of the major anti-spam technologies, such as Yahoo!, Microsoft, IBM, and other anti-spam technologies that are advocated and hosted, which are not appropriate in the reverse validation query technology, but in some ways these are more complex validation queries.

2.2.1, reverse query technology

From the point of view of spam forgery, can solve the problem of forgery of mail, can avoid the production of a lot of junk mail. To limit the forgery of sender addresses, some systems require the sender's email address to be validated, including:

Reverse mail exchange (RMX)

Sender's License (SPF)

Mark Mail Protocol (DMP)

These technologies are relatively similar. DNS is a global Internet service to handle the transformation between IP addresses and domain names. In 1986, the DNS extension, and the Mail exchange record (MX), when sending mail, the mail server by querying MX records to correspond to the recipient's domain name.

Similar to MX records, the reverse query solution is to define the reverse MX record ("rmx"--rmx, "SPF"--SPF, "DMP"--dmp), which is used to determine whether the specified domain name and IP address of the message correspond exactly. The basic reason is that the address of the forged mail is not true from the RMX address, so it can be judged whether it is forged.

2.2.2 Dkim Technology

DKIM (DomainKeys identified mail) technology is based on Yahoo's DomainKeys authentication technology and Cisco's Internet identified mail.

Yahoo's DomainKeys uses public key cryptography to authenticate e-mail senders. The sending system generates a signature and inserts the signature into the e-mail header, and the receiving system verifies the signature with a public key issued by DNS. Cisco's verification technology also uses cryptography, but it associates the signature with the e-mail message itself. The sending server signs the e-mail message and inserts a new title for the signature and the public key used to generate the signature. The receiving system verifies that the public key that is used to sign the e-mail message is authorized to use for this sender address.

Dkim will integrate the two verification systems. It will authenticate the signature in the same way as DomainKeys with the public key issued by DNS, and it will also use Cisco's title signature technology to ensure consistency.

DKIM provides a mechanism for messages to authenticate the integrity of the sender and message of each domain message. Once the domain can be validated, it is used to compare and detect forgery with the sender's address in the message. If it is forged, it may be spam or spoofed mail, and it can be discarded. If it is not forged, and the domain is known, it can build a good reputation and bind to the anti-spam policy system, or it can be shared among service providers or even directly to the user.

For well-known companies, usually need to send a variety of business mail to customers, banks, etc., so that the confirmation of the message is very important. can be protected from phishing attacks.

The DKIM technical standard is now submitted to the IETF for reference to the draft documentation Http://www.ietf.org/internet-drafts/draft-delany-domainkeys-base-00.txt

The implementation process of DomainKeys

The sending server went through two steps:

1. Establishment. The domain owner needs to generate a pair of public/private keys to mark all messages sent (allow multiple pairs of keys), the public key is exposed in DNS, and the private key is on the mail server using Domainkey.

2, signature. When each user sends a message, the mail system automatically uses the stored private key to generate the signature. The signature is part of the message header, and then the message is delivered to the receiving server.

The receiving server verifies signed messages in three steps:

1, prepare. The receiving server extracts the signature and send domain from the message header (from:) and obtains the corresponding public key from DNS.

2, verification. The receiving server verifies the signature generated with the private key using the public key obtained from DNS. This ensures that the message is sent truthfully and has not been modified.

3, transfer. The receiving server uses a local policy to make the final result, and if the domain is validated and other anti-spam tests are not determined, then the message is delivered to the user's Inbox, otherwise the message can be discarded, quarantined, and so on.

2.2.3, SenderID Technology

Gates had vowed in 2004 that Microsoft could wipe out spam in the future, and what he expected was Sender ID technology, but he recently retracted his predictions. This is the standard debate, Microsoft wants the IETF to use Sender ID technology as a standard, and received a lot of support, such as Cisco, Comcast, IBM, cisco,port25,sendmail,symantec,verisign, etc., It also includes the support of AOL, which later defected, but in the open source community, Microsoft has not received enough support, the IETF finally vetoed Microsoft's proposal.

The SenderID technology mainly includes two aspects: the support of sending mail party and the support of receiving mail party. There are three main parts of the mail-sending party: The sender needs to modify the DNS of the mail server, add a specific SPF record to indicate its identity, such as "V=spf1 Ip4:192.0.2.0/24-all", indicating the use of SPF1 version, for 192.0.2.0/ 24 This network segment is valid; In the optional case, the sender's MTA supports the addition of submitter extensions in the outgoing Mail communication protocol, and adds Resent-sender, Resent-from, Sender, etc. to the message.

The support of the receiving party is: The mail server of the addressee must adopt the SenderID inspection technology, check the messages received by PRA or Mailfrom, inquire about the SPF record of the sender's DNS, and verify the identity of the sender.

Therefore, using Sender ID technology, the whole process is:

The first step, the sender writes the message and sends;

The second step, the mail transfer to the receiving mail server;

In the third step, the receiving mail server checks the sender's identity by SenderID technology (the check is conducted through a specific DNS query);

In the fourth step, if you find that the sender's identity is the same as the sender's address, you receive the message, or you take a specific action on the message, such as rejecting the message directly or as a spam message.

Sender ID technology is not really a magic weapon to eradicate spam, it's just a technology that addresses the source of spam, and essentially does not identify whether a message is spam. For example, spammers can send spam by registering a cheap domain name, from a technical point of view, everything is in line with the specification; Also, spammers can forward their spam through the vulnerabilities of someone else's mail server, which is also SenderID technology.

2.2.4, Fairuce Technology

Fairuce (Fair use of unsolicited commercial email) was developed by IBM, which uses built-in identity management tools in the network domain to filter and block spam by analyzing e-mail domain names.

Fairuce links incoming messages to the IP address of their source-establishing a link between the e-mail address, the e-mail domain, and the sending computer to determine the legality of the e-mail. such as using SPF or other methods. If you can find a relationship, check the recipient's Black-and-white list, as well as the domain name, to determine the operation of the message, such as receiving, rejecting, and so on.

Fairuce also has a function, is to find out the source of spam by traceability, and the delivery of junk e-mail to send back to the source, as a way to combat spam senders. There are pros and cons to this approach. The advantage is that it can affect the performance of the spam source, and the downside is that it can hit the normal work of a functioning server, such as being exploited, and replicate a lot of garbage traffic.

2.2.5, limitations and shortcomings

These solutions all have some usability, but there are some drawbacks:

* * Non-host or empty domain name

The reverse query method requires that the message be from a known and trusted mail server and correspond to a reasonable IP address (reverse MX record). However, most domain names do not actually correspond to completely static IP addresses. Typically, individuals and small businesses also want to have their own domain name, but this does not provide enough IP addresses to meet the requirements. DNS registration host, such as GoDaddy, to those who do not have a host or only the domain name to provide free mail forwarding services. Although this mail forwarding service can only manage incoming messages, it cannot provide mail delivery services.

The reverse query solution poses some problems for users who do not have a host or only a domain name:

• No reverse MX record. These users can now configure the mail client to send mail with their own registered domain name. However, to reverse query the IP address of the sender's domain name is not found at all. Especially for those who are moving, dialing, and other users who frequently change their IP addresses.

• Cannot send mail. One way to solve the above problem is to forward the message via the ISP's server, which provides a reverse MX record, but the ISP is not allowed to forward mail now as long as the sender's domain name is different from the ISP's domain name.

In both cases, these users will be intercepted by the reverse query system.

* * Legal Domain Name

Being able to verify identity is not necessarily a legal identity, for example: spammers can register cheap domain names to send spam, from a technical point of view, everything is in line with the specification; and, at present, many spammers can access the legitimate mail system through other people's mail server vulnerabilities to forward their spam messages. , these issues are unresolved for validation queries.

2.3. Challenge

Spammers use some automated mail-sending software to generate millions of of messages per day. The challenge of the technology by delaying the message processing process will be able to hinder a large number of mail senders. Normal users who send only a small number of messages will not be significantly affected. However, the challenges of technology have been successful only when few people have used them. If more popular, it may be that people are more concerned about whether it will affect the delivery of mail than the blocking of spam.

Here are two main forms of challenge: Challenge-response, and computational challenges (Challenge-response and proposed computational challenges)

2.3.1 Challenge-response

The challenge-response (CHALLENGE-RESPONSE:CR) system retains the list of licensed senders. Messages sent by a new mail sender will be temporarily preserved and not delivered immediately. Then return a message to the sender that contains the challenge (the challenge can be a connection URL or request a reply). When the challenge is completed, the new sender is added to the list of licensed senders. For spam messages that use fake e-mail addresses, they are unlikely to receive the challenge, and it is impossible to answer all the challenges if you use a real e-mail address. However, the CR system still has many limitations:

CR deadlock. If Alice tells Bill to send a message to friend Charlie. Bill sent a message to Charlie,charlie's CR system to temporarily interrupt mail and send Bill a challenge. But Bill's CR system interrupts the challenge emails sent here by Charlie and sends its own challenge. As a result, the user is not receiving the challenge and the user is unable to reply to the message. And the user has no way of knowing that there is a problem in the course of the challenge. Therefore, if both sides use the CR system, they may not be able to communicate at all.

Automated system issues. Mailing lists or automated systems, such as the "Send To friends ..." feature of some websites, will not be able to respond to challenges.

Explain the challenge. Many CR systems perform interpretive challenges. These complex CR systems include character recognition and parameter matching, but even so, automation can be done. Yahoo's CR system, for example, is vulnerable to systems with simple smart character analysis when creating new mail accounts. Hushmail's mail CR system requires that the specified graph be found from the blue background image (analysis background, figure, submission coordinates, which is possible)

These are highlighted in the market myth two points: 1, people have to provide challenges, 2, these problems are very complex and is not likely to automate the operation. But in reality, most spammers ignore these CR systems because they are primarily concerned about not having a large number of recipients, rather than worrying about the complexity of the challenges. Many spammers also use a valid mailing address. When the CR system interferes with spam, those senders will also find ways to automate these challenges.

2.3.2, Computational challenges

A number of computational challenges are now also proposed for computational Challenge (CC), such as by increasing the "cost" of sending mail. Most CC systems use complex algorithms to deliberately delay time. For a single user, this delay is difficult to detect, but for spammers who send large amounts of mail, this means it takes a lot of time. Instances of the CC system, such as hash Cash (http://www.cypherspace.org/adam/hashcash/). However, even so, the CC system will affect fast communication and not just spam. These limitations include:

• Unequal impact. The computational challenge is based on CPU, memory, and network, for example, it can take 10 seconds to challenge on a 1Ghz computer, but it takes 20 seconds to 500Mhz.

• Mailing list. Many mailing lists have thousands of or even millions of of respondents. For example, Bugtraq may be considered spam. The CC system is not realistic to process mailing lists. If spam is sent through legitimate mailing lists to circumvent the challenge, then there are ways to circumvent the challenges.

• Robot programs. Sobig or other viruses like spam can allow spammers to control a large number of machines. This allows them to use a large number of systems to balance the "cost".

• Legal robot programs. Spammers send spam because they generate revenue. If these people join together, it is possible to provide a large number of systems to share the "cost", which is entirely legal and does not need to pass the virus.

At present, the computational challenge has not been widely used, because this technology can not solve the spam problem, but may interfere with the normal user.

2.4. Password operation

Now we propose some schemes that use cryptography to verify the sender of mail. In essence, these systems use certificates to provide proof. Fake messages can be easily identified without a proper certificate, and here are some of the password fixes in the study:

Amtp. Http://www.ietf.org/internet-drafts/draft-weinman-amtp-02.txt

MTP. Http://www.ietf.org/internet-drafts/draft-danisch-email-mtp-00.txt

S/MIME and pgp/mime. Http://www.imc.org/smime-pgpmime.html

The current Mail protocol (SMTP) does not directly support cryptographic authentication. The solutions in the study extend SMTP (such as S/mime,pgp/mime and AMTP) and others are intended to replace the current messaging system, such as MTP. Interestingly, MTP's author said: "SMTP has been more than 20 years old, but some modern demand has been developed over the past 5-10 years." Many of the extensions are for SMTP statements and semantics, and pure SMTP does not meet these requirements, and it is hard to break through without changing the SMTP statements. "However, many of the extended SMTP instances are just indicative of the variability of SMTP, not invariance, and it is not necessary to completely create a new mail transport protocol."

Some certificate authorities must be available when using a certificate, such as X.509 or TLS, but if the certificate is stored in DNS, the private key must be available at the time of authentication. (In other words, if spammers can access these private keys, they can generate a valid public key). On the other hand, the primary Certificate Authority (CA) is also used, but the message is a distributed system, and no one wants all messages to be controlled by a separate ca. Some solutions allow multiple CA systems, for example, X.509 to determine which CA servers are available. This extensibility also makes it possible for spammers to run private CA servers.

Without a certificate authority, there are other ways to distribute keys between the sender and receiver. For example, PGP, you can share the public key beforehand. This approach is feasible in disconnected or closed groups, but is not appropriate when used by a large number of individuals, especially in situations where new contacts are needed. In essence, preshared keys have some sort of white-list filters: Only people who know each other can send mail.

Unfortunately, these cryptographic solutions do not stop spam, for example, assuming one of these encryption schemes is widely accepted. None of these methods can confirm that the email address is authentic, but only that the sender has the correct key to the message. Disadvantages are:

• Misuse of automation tools. If applied in a wide range of applications, there is a need for a way for all users to generate certificates or keys (including mail server, mail client, dependencies and corresponding solutions) the system is likely to provide a key in an automated way. However, spammers can be trusted to misuse any automated system and to send certified spam messages.

• Usability issues. There is also some usability debate. For example, what if the CA server is unavailable? Message is suspended? Refund? Or is it still available? Spammers have recently denied service attacks on more than half of the blacklist sites and have caused them to be inaccessible. Apparently, these spammers want to prevent others from updating their blacklist. For a single CA server, it is clear that such a fate cannot be avoided.

3, Summary

This article introduces some anti-spam technology, in fact, many of the anti-spam programs are not just a technology, but a combination of many kinds of technology.

The dangers of spam are now well entrenched, and anti-spam messages are getting more and more successful, for example, Scott Richter 7 million to Microsoft. Many countries are also legislating against spam in order to get legal support.

But technically, this is the same as the reverse attack, is a positive and negative game process, a new anti-spam technology will inevitably appear a corresponding spam technology, and any one technology, there is no way to solve all problems, technology development will continue.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Anti-spam Technology analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Anti-spam Technology analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support