MIME protocol analysis)

Source: Internet
Author: User
Tags rfc822 microsoft outlook

Http://blog.csdn.net/bripengandre/article/details/2192982

MIME protocol analysis

Chapter 4. MIME Overview

MIME is called "Multipurpose Internet Mail Extensions". The exact Chinese name is called "multi-purpose Internet Mail Extension ". It is a widely used email technical specification. The basic content is defined in RFC 2045-2049 (RFC1521 and RFC1522 are outdated versions ).

MIME tries to allow the mail to transmit arbitrary binary files without changing the SMTP protocol and RFC822 (Mail format standard. To this end, it has taken some measures above these protocols. This is what we will focus on below.

Chapter 4. MIME details 2nd. Improvement Measures

An email consists of an envelope, an email header, and an email body. The envelope obviously does not contain binary information, while the other two parts may contain any binary sequence, so it needs to be improved. MIME captures these two points to improve them.

1)Added some mail header information to negotiate MIME parameters.

2)Defines the formats of many emails and standardizes the representation of multimedia emails.

3)Defines the Transfer Encoding so that any binary file can be transferred.

Here, I still have to tirelessly emphasize that all the improvement measures are based on not changing the original SMTP protocol and RFC822. In fact, we can regard these improvement measures as pre-processing before sending emails using SMTP.

2.2. source code of a simple email

In order to have an intuitive understanding of MIME mail, the source code of a Simple Mail is provided first. In the source code, the space after the line number and the line number is added for convenience of analysis. ".........." indicates that the large segment encoding is omitted here.

1 From: "bhw98" <bhw98@sina.com>

2 Reply-To: bhw98@sina.com

3 To: bluesky7810@163.com>

4 Subject: Re: help

5 X-Mailer: Foxmail 4.2 [cn]

6 Mime-Version: 1.0

7 Content-Type: multipart/alternative;

8 boundary = "==== 002_Dragon307572345230 _ ===="

9

10

11 This is a multi-part message in MIME format.

12

13 -- ==== 002_dragon307572345230 _ ====

14 Content-Type: text/plain; charset = "gb2312"

15 content-transfer-encoding: quoted-printable

16

17 bluesky7810 = a3 = ac = C4 = fa = BA = C3 = a3 = A1

18

19 = A1 = A1 = A1 = A1 = D4 = da = cf = C2 = C6 = AA = D7 = EE = BA = F3 = BF = C9 = d2 = D4 = cf = c2 = D4 = D8 = b0 = A1 = a3 = ac = C4 = E3

............

30 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = a1 = A1 = A1 = A1 = A1 = A12003-04-07

31

32 -- ==== 002_dragon307572345230 _ ====

33 Content-Type: text/html; charset = "gb2312"

34 Content-Transfer-Encoding: quoted-printable

35

36 <! Doctype html public "-// W3C // dtd html 4.0 Transitional // EN">

37 <HTML> <HEAD>

38 <META content = 3D "text/html; charset = 3Dgb2312" =

39 http-equiv = 3DContent-Type>

40 <META content = 3D "MSHTML 5.00.2920.0" name = 3 DGENERATOR>

............

79 </HTML>

80

81 -- ==== 002_Dragon307572345230 _ ==== --

82

We can view the source code in the user agent. For example, you can use a user agent to view the original code of an email. For example, in Foxmail, right-click the selected email and select "original information. As for the specific meaning of the source code, it is what will be discussed later.

2.3. email header 2.3.1. email header domain

The mail header contains important information such as the sender, recipient, subject, time, MIME Version, and mail content type. Each piece of information is called a domain, which is composed of ":" and information content after the domain name. It can be a row, long or occupying multiple rows. The first line of the domain must be written with a "Header", that is, there must be no blank characters (spaces and tabs) on the left side. To continue a line, you must start with a blank character, the first blank character is not inherent in the information. It must be filtered out during decoding. For example, rows 7-8 of Example 2 belong to one domain.

Table 1 lists the common domains, domain meanings, and domain values in the mail header.

Domain Name

Description

Add

Received

Transfer Path

Email servers at all levels

Return-Path

Reply address

Target email server

Delivered-

Sending address

Target email server

Reply-

Reply address

Email creator

From

Sender address

Email creator

To

Recipient address

Email creator

Cc

CC address

Email creator

Bcc

Dark delivery address

Email creator

Date

Date and Time

Email creator

Subject

Topic

Email creator

Message-ID

Message ID

Email creator

MIME-Version

MIME Version

Email creator

Content-Type

Content type

Email creator

Content-Transfer-Encoding

Content Transmission Encoding Method

Email creator

Table 1 common domains in the mail header

Non-standard and custom domain names all start with X-, such as X-Mailer and X-MSMail-Priority. They can be understood only when the same program receives and sends emails. Except for the following two domain names, the meaning of other domain names is very clear, so we will only explain the following two domain names.

2.3.2. Content-Type field

The Content-Type field, that is, the Content Type field, which describes the Type of the transmitted Content. The Cotent-Type field is composed of the "main Type/subtype". The main types include text, image, audio, video, application, multipart, and message, text, image, audio, video, application, segmentation, messages, and so on. Each primary type may have multiple child types, such as text, plain, html, xml, css, and other child types. The primary and subtypes starting with X-also indicate custom types. They are not officially registered with IANA, but most of them have been agreed to be vulgar. For example, application/x-zip-compressed is a ZIP file. In Windows, "HKEY_CLASSES_ROOT/MIME/Database/Content Type" in the registry lists most known Content-types except multipart.

Parameters can be included in various types. As for the form of parameters, there are many supplementary provisions in RFC, and some may include several parameters, as shown in table 2.

Primary Type

Parameter Name

Description

Text

Charset

Character Set

Image

Name

Name

Application

Name

Name

Multipart

Boundary

Boundary

Table 2 common parameters

 

2.3.3. Content-Transfer-Encoding domain

The Content-Transfer-Encoding domain is the Transfer Encoding domain. It is used to describe the Encoding method of the subsequent transmitted Content.

Content-Transfer-Encoding includes Base64, Quoted-printable, 7bit, 8bit, and Binary. 7bit is the default encoding method. The source code of the email was originally designed to be all printable ASCII codes. The text or data of non-ASCII code must be encoded in the required format, as shown in the preceding three examples. Base64, Quoted-Printable is the most widely used encoding method in non-English countries. The Binary method is symbolic without any practical value. For Base64 encoding and Quoted-Printable encoding, refer to the RFC document or another article "SMTP protocol analysis".

In recent years, most mail servers in China have supported the 8-bit method. Therefore, 8-bit encoding can be used only for mails transmitted in China, especially in the mail header, and Chinese characters are not processed. If the email is going abroad, it is still honestly encoded by Base64 or Quoted-printable.

2.4. Email body

The mail body Type is indicated by the Content-Type field in the mail header. Common simple types include text/plain (plain text) and text/html (Hyper text ).

The multipart type in the source code is the essence of MIME mail. The body is divided into multiple segments, each of which contains two parts: the header and the body. These two parts are also separated by blank lines. There are three common multipart types: multipart/mixed, multipart/related, and multipart/alternative. From their names, it is not difficult to deduce the meaning and usefulness of these types. The hierarchical relationships between them can be summarized as shown in Figure 1.

...

We can see that if you want to add attachments to an email, you must define the multipart/mixed segment. If there are embedded resources, you must at least define the multipart/related segment. If the plain text and hypertext coexist, define at least multipart/alternative segments. What is "at least "? For example, if only plain text and hypertext text are available, the type in the mail header is extended and defined as multipart/related or even multipart/mixed.

The common feature of multipart types is to specify the "boundary" parameter string in the field header, and each sub-segment in the segment body is bounded by this string. All child segments start with the "--" + boundary row, and the parent segment ends with the "--" + boundary + "--" Row. Segments and segments are also separated by blank lines. When the mail body is multipart, the start part of the mail body (before the first "--" + boundary line) can have some additional text lines, which are equivalent to comments, ignore when decoding. You can also include some additional text lines between segments, which are not displayed. If you are interested, verify them.

Combined with boundary demarcation and multipart hierarchical relationship diagram, we analyze the mail body hierarchy and segment nesting relationship in the source code. In the source code, lines 10-12 are additional text lines, and lines 13-82 are multipart/alternative segments, including two child segments: Lines 13-30 are plain text bodies and lines 32-79 are hypertext bodies.

Note that each section of the body contains its own attributes, which are described by the field header field. Table 3 lists common fields in the field header.

Domain Name

Description

Content-Type

Segment type

Content-Transfer-Encoding

Transfer Encoding Method of CIDR blocks

Content-Disposition

CIDR Block Arrangement

Content-ID

Segment ID

Content-Location

Segment position (PATH)

Content-Base

Base position of the CIDR Block

Table 3 common fields in the field Header

The meaning of each domain is the same as that of the domain with the same name in the mail header, except that the scope of the former is a segment, while that of the latter is the whole body of the email.

Chapter 3rd. Common Questions 3.1. How to get the MIME mail source code

Some well-developed mail client software, such as Microsoft Outlook Express and China-made Foxmail, provide the function of viewing and saving the mail source code (original information. In Foxmail, select mail, right-click, and select the "original information" menu item to view the source code of the mail.

Chapter 2. Analysis Solution

MIME is an email specification and is widely used in other protocols. For example, most of the so-called mails in SMTP and POP3 protocols follow the MIME mail specifications. Therefore, although MIME is not required in the functional requirement table, when we want to extract valid information from SMTP or POP3 communication, we have to understand the MIME protocol (because the extracted Mail Information complies with the MIME specifications ).

After we get the MIME information in SMTP or POP3 communication, we only need to perform simple MIME Decoding on it to get the original information sent.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.