Core anti-spam Firewall Technology Analysis

Last Update:2018-12-03 Source: Internet

Author: User

Tags mx record

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The origins and technical roots of Spam

The SMTP protocol itself is a simplified mail delivery protocol, which lacks many necessary identity authentication. This is one of the reasons why SMTP protocol causes spam flooding. The SMTP protocol allows the sender to forge the vast majority of the sender's characteristic information, such as the sender and mail route. Even after anonymous forwarding, open forwarding, and open proxy, it can almost completely erase the characteristics of spam senders. Currently, the vast majority of spam have forged their real sending sources, which makes it very difficult to stop the spread of spam.

The SMTP protocol still lacks some necessary behavior control and cannot effectively identify normal mail sending and spam sending behaviors. This is the second cause of spam flooding. Spam usually has certain behavioral characteristics, such as sending an extremely large number of emails within a short period of time, and usually have specific communication characteristics in mail communication.

Anti-Spam technology has certain defects. For example, it cannot be absolutely precise in the judgment of spam, or it requires a great cost; it is either because of restrictions in the actual environment that cannot be applied. For example, you cannot completely overturn the original SMTP protocol and use a new mail protocol that can avoid spam generation and transmission. Therefore, relying solely on technical means cannot completely solve spam.

Analysis of Anti-Spam technology

We take the protection technology used by an anti-spam firewall as an example to analyze which technologies are used by the Anti-Spam firewall. The Anti-Spam Firewall uses the following protection technologies:

1. Denial of Service Attack and security protection
2 IP address blocking list
3. Speed Control
4 dual-layer virus scanning
5. Custom Rules
6. Spam fingerprint check
7 email intent Analysis
8. Bayesian Intelligent Analysis
9 Rule-Based Scoring System
10 virus protection for decompressed files

Because Rate Control, virus scanning, and virus protection for extracted files are virus-specific, we will not discuss this feature as a subsidiary of anti-spam, it is worth mentioning that the firewall's defense against DDoS attacks is different from the anti-spam firewall's defense against DoS attacks. The Anti-Spam Firewall prevents DoS attacks by sending a large number of spam mails to an email address within a short period of time.

The core technologies for spam include Bayesian Intelligent Analysis, spam fingerprint check, rule-based scoring system, and custom rules, its core is Bayesian Intelligent Analysis and spam fingerprint detection technology. Next we will analyze anti-spam filtering technology one by one:

1. Spam fingerprint check

When talking about the fingerprint check on spam, many people think it is mysterious. In fact, the so-called mail fingerprint is a combination of some strings in the mail content, also known as snapshots. It identifies information that has been confirmed as spam from similar but different information. For example, if you are often troubled by spam, you will not be unfamiliar with the following terms: "proxy service", "Enrollment", "cash ", do you think of spam when you see them?

In fact, this is the fingerprint of spam, and it shares the same idea with the feature code recognition of anti-virus technology. The Anti-Spam firewall identifies similar but different information that has been confirmed as spam, and finally identifies spam.

Of course, the accuracy of the fingerprint check depends on the fingerprint library of spam. The Anti-Spam firewall first assigns a value to each character that appears in the email, it is worth mentioning that this value is determined to be classified according to the characteristics of specific garbage use rules, and then a comprehensive value is calculated using the statistical method. It can also be determined based on whether it is similar to other emails received multiple times (multiple similar emails may be spams ).

2 Bayesian Intelligent Analysis

In my opinion, Bayes' Intelligent Analysis is suspected of being fashionable, mainly due to the poisoning of the AI course at school, and the visual fatigue of intelligent words, after all, if a technology can be linked to intelligence, it seems a lot more advanced. In fact, this intelligent analysis is just an application of the statistical law. Of course, objectively speaking, this statistical application indeed makes anti-spam a lot smarter. Now, it is a waste of time to talk about it. Today, we will not talk about Bayesian law, but will directly introduce Bayesian anti-spam algorithms, through the algorithm, we can see that this smart analysis is actually an Intelligent Analysis of Anti-spam by Combining IP blocking list, spam fingerprint check, and statistical rules.

The Bayesian anti-spam algorithm is as follows:

1) collect a large number of spam and non-spam mails, and create a spam set and a non-spam set.
2) extract the independent strings in the subject and body of the email, such as abc32 and ￥234, as the token string, and count the number of times the extracted token string appears, that is, the word frequency. The preceding methods are used to process the spam set and all the emails in the non-spam set.
3) Each mail set corresponds to a hash table, and hashtable_good corresponds to a non-spam mail set while hashtable_bad corresponds to a spam mail set. The table stores the ing between the token string and the word frequency.
4) calculate the probability that a token string appears in each hash table P = (Word Frequency of a token string)/(corresponding to the length of the hash table)
5) Considering Both hashtable_good and hashtable_bad, It is inferred that a token string appears in the new mail, and the new mail is likely to be spam. The mathematical expression is:
A event-the email is spam;
T1, T2 ....... TN indicates the token string. P (A | Ti) indicates the probability that the email contains the token string ti. Set p1 (Ti) = (the value of Ti in hashtable_good)
P2 (Ti) = (the value of Ti in hashtable _ bad)
P (A | Ti) = p1 (Ti)/[(p1 (Ti) + p2 (Ti)];
6) create a new hash table hashtable_probability to store the Ti ing between the token string Ti and P (A | TI ).

7) So far, the learning process of spam and non-spam is over. Based on the created hash table hashtable_probability, You can estimate the possibility of a new mail being spam.

When a new email is sent, follow Step 2 to generate a token string. Query hashtable_probability to obtain the key value of the token string.
Assume that N token strings, T1, T2… are obtained from the email ....... In TN and hashtable_probability, the corresponding values are P1, P2 ,...... Pn, P (A | T1, T2, T3 ...... Tn) indicates that multiple token strings T1, T2... appear simultaneously in the email ....... TN indicates the probability that the email is spam.
The compound probability formula can be used to obtain P (A | T1, T2, T3 ...... Tn) = (P1 * P2 *.... PN)/[P1 * P2 *..... Pn + (1-p1) * (1-p2 )*... (1-Pn)] When P (A | T1, T2, T3 ...... Tn) when the threshold is exceeded, you can determine that the email is spam.

Relationship between anti-spam firewall and Firewall

A firewall is called in a broad sense. from a practical perspective, a firewall is a protection device designed to protect internal network resources (such as WWW servers and file servers) of an enterprise from external security threats, implement protection for internal network resources by setting different protection levels and protection measures. Depending on the protection focus, firewalls can be divided into virus firewalls, DDoS (Distributed Denial of Service Attack) firewalls, spam firewalls, and so on.

In short, the anti-spam Firewall is a dedicated firewall for anti-spam.

The firewall has a common feature in terms of working methods: to analyze the data packets entering and exiting the firewall, determine whether to allow or block the data packets. In actual deployment, as a dedicated spam firewall, it can be placed in front of a common firewall, but it is also behind the firewall. We recommend that you put it in a logical connection with the mail server.

A) The MX record must be modified (or added) outside the firewall. The MX record can point to the anti-spam firewall. If there are two, the MX record pointing to the anti-spam firewall has a higher priority.

B) In the firewall, direct the SMTP Nat record to the anti-spam firewall. In either case, you do not need to make any changes on the server or client software (Outlook/Foxmail, etc.

I wrote about anti-spam technology and anti-spam firewall.

Security Protection knowledge: core technical analysis of anti-spam Firewall

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More