Http://blog.csdn.net/bripengandre/article/details/2192982
MIME protocol analysis
Chapter 4. MIME Overview
MIME is called "Multipurpose Internet Mail Extensions". The exact Chinese name is called "multi-purpose Internet Mail Extension ". It is a widely used email technical specification. The basic content is defined in RFC 2045-2049 (RFC1521 and RFC1522 are outdated versions ).
MIME tries to allow the mail to transmit arbitrary binary files without changing the SMTP protocol and RFC822 (Mail format standard. To this end, it has taken some measures above these protocols. This is what we will focus on below.
Chapter 4. MIME details 2nd. Improvement Measures
An email consists of an envelope, an email header, and an email body. The envelope obviously does not contain binary information, while the other two parts may contain any binary sequence, so it needs to be improved. MIME captures these two points to improve them.
1)Added some mail header information to negotiate MIME parameters.
2)Defines the formats of many emails and standardizes the representation of multimedia emails.
3)Defines the Transfer Encoding so that any binary file can be transferred.
Here, I still have to tirelessly emphasize that all the improvement measures are based on not changing the original SMTP protocol and RFC822. In fact, we can regard these improvement measures as pre-processing before sending emails using SMTP.
2.2. source code of a simple email
In order to have an intuitive understanding of MIME mail, the source code of a Simple Mail is provided first. In the source code, the space after the line number and the line number is added for convenience of analysis. ".........." indicates that the large segment encoding is omitted here.
1 From: "bhw98" <bhw98@sina.com>
2 Reply-To: bhw98@sina.com
3 To: bluesky7810@163.com>
4 Subject: Re: help
5 X-Mailer: Foxmail 4.2 [cn]
6 Mime-Version: 1.0
7 Content-Type: multipart/alternative;
8 boundary = "==== 002_Dragon307572345230 _ ===="
9
10
11 This is a multi-part message in MIME format.
12
13 -- ==== 002_dragon307572345230 _ ====
14 Content-Type: text/plain; charset = "gb2312"
15 content-transfer-encoding: quoted-printable
16
17 bluesky7810 = a3 = ac = C4 = fa = BA = C3 = a3 = A1
18
19 = A1 = A1 = A1 = A1 = D4 = da = cf = C2 = C6 = AA = D7 = EE = BA = F3 = BF = C9 = d2 = D4 = cf = c2 = D4 = D8 = b0 = A1 = a3 = ac = C4 = E3
............
30 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = a1 = A1 = A1 = A1 = A1 = A12003-04-07
31
32 -- ==== 002_dragon307572345230 _ ====
33 Content-Type: text/html; charset = "gb2312"
34 Content-Transfer-Encoding: quoted-printable
35
36 <! Doctype html public "-// W3C // dtd html 4.0 Transitional // EN">
37 <HTML> <HEAD>
38 <META content = 3D "text/html; charset = 3Dgb2312" =
39 http-equiv = 3DContent-Type>
40 <META content = 3D "MSHTML 5.00.2920.0" name = 3 DGENERATOR>
............
79 </HTML>
80
81 -- ==== 002_Dragon307572345230 _ ==== --
82
We can view the source code in the user agent. For example, you can use a user agent to view the original code of an email. For example, in Foxmail, right-click the selected email and select "original information. As for the specific meaning of the source code, it is what will be discussed later.
2.3. email header 2.3.1. email header domain
The mail header contains important information such as the sender, recipient, subject, time, MIME Version, and mail content type. Each piece of information is called a domain, which is composed of ":" and information content after the domain name. It can be a row, long or occupying multiple rows. The first line of the domain must be written with a "Header", that is, there must be no blank characters (spaces and tabs) on the left side. To continue a line, you must start with a blank character, the first blank character is not inherent in the information. It must be filtered out during decoding. For example, rows 7-8 of Example 2 belong to one domain.
Table 1 lists the common domains, domain meanings, and domain values in the mail header.
Domain Name |
Description |
Add |
Received |
Transfer Path |
Email servers at all levels |
Return-Path |
Reply address |
Target email server |
Delivered- |
Sending address |
Target email server |
Reply- |
Reply address |
Email creator |
From |
Sender address |
Email creator |
To |
Recipient address |
Email creator |
Cc |
CC address |
Email creator |
Bcc |
Dark delivery address |
Email creator |
Date |
Date and Time |
Email creator |
Subject |
Topic |
Email creator |
Message-ID |
Message ID |
Email creator |
MIME-Version |
MIME Version |
Email creator |
Content-Type |
Content type |
Email creator |
Content-Transfer-Encoding |
Content Transmission Encoding Method |
Email creator |
Table 1 common domains in the mail header
Non-standard and custom domain names all start with X-, such as X-Mailer and X-MSMail-Priority. They can be understood only when the same program receives and sends emails. Except for the following two domain names, the meaning of other domain names is very clear, so we will only explain the following two domain names.
2.3.2. Content-Type field
The Content-Type field, that is, the Content Type field, which describes the Type of the transmitted Content. The Cotent-Type field is composed of the "main Type/subtype". The main types include text, image, audio, video, application, multipart, and message, text, image, audio, video, application, segmentation, messages, and so on. Each primary type may have multiple child types, such as text, plain, html, xml, css, and other child types. The primary and subtypes starting with X-also indicate custom types. They are not officially registered with IANA, but most of them have been agreed to be vulgar. For example, application/x-zip-compressed is a ZIP file. In Windows, "HKEY_CLASSES_ROOT/MIME/Database/Content Type" in the registry lists most known Content-types except multipart.
Parameters can be included in various types. As for the form of parameters, there are many supplementary provisions in RFC, and some may include several parameters, as shown in table 2.
Primary Type |
Parameter Name |
Description |
Text |
Charset |
Character Set |
Image |
Name |
Name |
Application |
Name |
Name |
Multipart |
Boundary |
Boundary |
Table 2 common parameters
2.3.3. Content-Transfer-Encoding domain
The Content-Transfer-Encoding domain is the Transfer Encoding domain. It is used to describe the Encoding method of the subsequent transmitted Content.
Content-Transfer-Encoding includes Base64, Quoted-printable, 7bit, 8bit, and Binary. 7bit is the default encoding method. The source code of the email was originally designed to be all printable ASCII codes. The text or data of non-ASCII code must be encoded in the required format, as shown in the preceding three examples. Base64, Quoted-Printable is the most widely used encoding method in non-English countries. The Binary method is symbolic without any practical value. For Base64 encoding and Quoted-Printable encoding, refer to the RFC document or another article "SMTP protocol analysis".
In recent years, most mail servers in China have supported the 8-bit method. Therefore, 8-bit encoding can be used only for mails transmitted in China, especially in the mail header, and Chinese characters are not processed. If the email is going abroad, it is still honestly encoded by Base64 or Quoted-printable.
2.4. Email body
The mail body Type is indicated by the Content-Type field in the mail header. Common simple types include text/plain (plain text) and text/html (Hyper text ).
The multipart type in the source code is the essence of MIME mail. The body is divided into multiple segments, each of which contains two parts: the header and the body. These two parts are also separated by blank lines. There are three common multipart types: multipart/mixed, multipart/related, and multipart/alternative. From their names, it is not difficult to deduce the meaning and usefulness of these types. The hierarchical relationships between them can be summarized as shown in Figure 1.
...
We can see that if you want to add attachments to an email, you must define the multipart/mixed segment. If there are embedded resources, you must at least define the multipart/related segment. If the plain text and hypertext coexist, define at least multipart/alternative segments. What is "at least "? For example, if only plain text and hypertext text are available, the type in the mail header is extended and defined as multipart/related or even multipart/mixed.
The common feature of multipart types is to specify the "boundary" parameter string in the field header, and each sub-segment in the segment body is bounded by this string. All child segments start with the "--" + boundary row, and the parent segment ends with the "--" + boundary + "--" Row. Segments and segments are also separated by blank lines. When the mail body is multipart, the start part of the mail body (before the first "--" + boundary line) can have some additional text lines, which are equivalent to comments, ignore when decoding. You can also include some additional text lines between segments, which are not displayed. If you are interested, verify them.
Combined with boundary demarcation and multipart hierarchical relationship diagram, we analyze the mail body hierarchy and segment nesting relationship in the source code. In the source code, lines 10-12 are additional text lines, and lines 13-82 are multipart/alternative segments, including two child segments: Lines 13-30 are plain text bodies and lines 32-79 are hypertext bodies.
Note that each section of the body contains its own attributes, which are described by the field header field. Table 3 lists common fields in the field header.
Domain Name |
Description |
Content-Type |
Segment type |
Content-Transfer-Encoding |
Transfer Encoding Method of CIDR blocks |
Content-Disposition |
CIDR Block Arrangement |
Content-ID |
Segment ID |
Content-Location |
Segment position (PATH) |
Content-Base |
Base position of the CIDR Block |
Table 3 common fields in the field Header
The meaning of each domain is the same as that of the domain with the same name in the mail header, except that the scope of the former is a segment, while that of the latter is the whole body of the email.
Chapter 3rd. Common Questions 3.1. How to get the MIME mail source code
Some well-developed mail client software, such as Microsoft Outlook Express and China-made Foxmail, provide the function of viewing and saving the mail source code (original information. In Foxmail, select mail, right-click, and select the "original information" menu item to view the source code of the mail.
Chapter 2. Analysis Solution
MIME is an email specification and is widely used in other protocols. For example, most of the so-called mails in SMTP and POP3 protocols follow the MIME mail specifications. Therefore, although MIME is not required in the functional requirement table, when we want to extract valid information from SMTP or POP3 communication, we have to understand the MIME protocol (because the extracted Mail Information complies with the MIME specifications ).
After we get the MIME information in SMTP or POP3 communication, we only need to perform simple MIME Decoding on it to get the original information sent.