[Network programming]-MIME format details

Source: Internet
Author: User
Tags microsoft outlook qmail

Source: http://www.cnblogs.com/robinhood/articles/540464.html

Q What is mime? What is mime mail?

A mime is called "Multipurpose Internet Mail Extensions". The exact Chinese name is called "multi-purpose Internet Mail Extension ". It is a widely used email technical specification. Its basic content is defined in RFC 2045-2049.

Naturally, mime mail is an email that complies with the mime specification, or an email that is encoded according to the mime specification.

Before mime is introduced, RFC 822 can only send basic ASCII code text information. It is very difficult to implement the content of an email, such as binary files, sounds, and animations. Mime provides a method that can append multiple encoding files to an email to make up for the shortcomings of the original information format. In fact, mime is not only a part of the HTTP protocol standard.

The following are some mime mail examples, which give us an intuitive impression on the mime encoding format. Example 1 is the simplest, with only plain text, basically the RFC 822 format. Example 2 is more complex and contains plain text and ultra text. Example 3 is the most complex, contains plain text, hypertext, embedded resources, and file attachments. Here, the space after the row number and the row number is added for convenience of analysis. "......" indicates that the large section encoding is omitted here.

Example 1

 
Date: Thu, 18 Apr 2002 09:32:45 + 0800 from: <bhw98@sina.com> to: <bhwang@jlonline.com> subject: testmime-version: 1.0content-type: text/plain; charset = "iso-8859-1" this is a simple mail.

Example 2

From: "bhw98" <bhw98@sina.com> reply-to: bhw98@sina.comTo: <bluesky7810@163.com> subject: Re: helpx-mailer: Foxmail 4.2 [CN] mime-version: 1.0content-type: multipart/alternative; boundary = "==== 002_dragon307572345230 _ ====" this is a multi-part message in MIME format. -- =====002_dragon307572345230 _ ==== Content-Type: text/plain; charset = "gb2312" content-transfer-encoding: quoted-printablebluesky7810 = a3 = ac = C4 = fa = BA = C3 = a3 = A1 = A1 = A1 = A1 = A1 = D4 = da = cf = C2 = C6 = AA = D7 = EE = BA = F3 = BF = C9 = d2 = D4 = cf = C2 = D4 = D8 = b0 = A1 = a3 = ac = C4 = E3 ............ = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A1 = A12003-04-07 -- ==== 002_dragon307572345230 _ ==== Content-Type: text/html; charset = "gb2312" content-transfer-encoding: quoted-printable <! Doctype HTML public "-// W3C // dtd html 4.0 transitional // en"> <HTML> 

Example 3

Return-path: <bluesky7810@163.com> delivered-to: bhw98@sina.comReceived: (Qmail 75513 invoked by alias); 20 May 2002 02:19:53-0000 Received: from unknown (Helo bluesky) (61.155.118.135) by 202.106.187.143 with SMTP; 20 May 2002 02:19:53-Your message-id: <007f01c3111c $742fec00 $ 03667f @ bluesky> from: "=? Gb2312? B? Wlbatrxezowncg ==? = "<Bluesky7810@163.com> to:" bhw98 "<bhw98@sina.com> CC: <bhwang@jlonline.com> subject: =? Gb2312? B? Ztk1xlbgtk6/2rpm0pi =? = Date: sat, 20 May 2002 10:03:36 + 0800mime-version: 1.0content-type: multipart/mixed; boundary = "---- = _ blank" X-priority: 3x-msmail-priority: normalx-mailer: microsoft Outlook Express 5.00.2919.6700x-mimeole: produced by Microsoft mimeole v5.00.2919.6700this is a multi-part message in MIME format. ------ = _ NextPart_000_007A_01C3115F.80DFC5E0Content-Type: multipart/related; Type = "multipart/alternative"; boundary = "---- = _ nextpart_000000007b_01c3115f.80dfc5e0" ------ = _ NextPart_001_007B_01C3115F.80DFC5E0Content-Type: multipart/alternative; boundary = "---- = _ translate" ------ = _ NextPart_002_007C_01C3115F.80DFC5E0Content-Type: text/plain; charset = "gb2312" content-transfer-encoding: quoted-printablebhw98, = C4 = E3 = BA = C3! = D5 = e2 = CA = C7 = Ce = d2 = D0 = B4 = B5 = C4 = B6 = e0 = B4 = AE = BF = da = Cd = A8 = D0 = C5 = B5 = C4 = B3 = Cc = D0 = F2, = C7 = EB = D6 = B8 = BD = Cc! ------ = _ NextPart_002_007C_01C3115F.80DFC5E0Content-Type: text/html; charset = "gb2312" content-transfer-encoding: quoted-printable <! Doctype HTML public "-// W3C // dtd html 4.0 transitional // en"> <HTML> 

Q How do I obtain the source code when I start to study mime mail?

A. Some well-functional mail client software, such as Microsoft Outlook Express and Foxmail in China, provide the function of viewing and saving the mail source code (original information. In Foxmail, right-click the mail list and choose "raw information" from the menu to view the information. In the main menu, click "file-export" to save the information. In Outlook Express, the corresponding operations are "attributes" and "Save ". The saved. eml file can call theseProgramOpen.

Q: What is the composition of mime mail?

Generally, a mime Message consists of a message header and a message body. Now we are concerned with mime mail, so in the following discussion, we should refer to "message" as "mail ". In the above example, 1-6 rows of Example 1, 1-8 rows of example 2, 1-18 rows of Example 3, are the mail headers; 8-9 rows of Example 1, 10-82 rows of example 2, in Example 3, rows 20-are the body of the email. The header and body are separated by blank lines, for example, 7th rows of Example 1, 9th rows of example 2, and 19th rows of example 3. Empty lines are not allowed in the mail header. Some emails cannot be identified by the mail client. The original code is displayed because the first line is empty.

The mail header contains important information such as the sender, recipient, subject, time, MIME Version, and mail content type. Each piece of information is called a domain, which is composed of ":" and information content after the domain name. It can be a row, long or occupying multiple rows. The first line of the domain must be written with a "Header", that is, there must be no blank characters (spaces and tabs) on the left side. To continue a line, you must start with a blank character, the first blank character is not inherent in the information. It must be filtered out during decoding. For example, rows 7-8 of example 2, 4-5 of Example 3, and 13-14 belong to one domain respectively.

The body contains the content of the email. Its type is indicated by the Content-Type field in the email header. Common simple types include text/plain (plain text) and text/html (Hyper Text ).

the multipart type in examples 2 and 3 is the essence of mime mail. The body is divided into multiple segments, each of which contains two parts: the header and the body. These two parts are also separated by blank lines. There are three common multipart types: multipart/mixed, multipart/related, and multipart/alternative. From their names, it is not difficult to deduce the meaning and usefulness of these types. The hierarchical relationships between them can be summarized as follows:

+ Complete multipart/mixed parts + | + ----------------- multipart/related ------------------ + | + ----- multipart/alternative ------ ++ ---------- + | + ------ + | | embedded resources | attachment | + ------------ ++ ------------ + | + ---------- + | + ------ + | plain text | hypertext body | + ------------ + | + ---------- + | + ------ + | embedded resources | attachment | + ---------------------------------- + + ---------- + | + ------ + | + -------------------------------------------------------- + | + accept +

We can see that if you want to add attachments to an email, you must define the multipart/mixed segment. If there are embedded resources, you must at least define the multipart/related segment. If the plain text and hypertext coexist, define at least multipart/alternative segments. What is "at least "? For example, if only plain text and hypertext text are available, the type in the mail header is extended and defined as multipart/related or even multipart/mixed.

The common feature of multipart types is to specify the "boundary" parameter string in the field header, and each sub-segment in the segment body is bounded by this string. All child segments start with the "--" + boundary row, and the parent segment ends with the "--" + boundary + "--" Row. Segments and segments are also separated by blank lines. When the mail body is multipart, the start part of the mail body (before the first "--" + boundary line) can have some additional text lines, which are equivalent to comments, ignore when decoding. You can also include some additional text lines between segments, which are not displayed. If you are interested, verify them.

Combined with boundary demarcation and multipart hierarchical relationship diagram, we analyze the mail body hierarchy and segment nesting relationship in example 2 and example 3.

In Example 2, lines 10-12 are additional text lines, and lines 13-82 are multipart/alternative segments, including two child segments: Lines 13-30 are plain text bodies and lines 32-79 are hypertext bodies.

In Example 3, rows 20-21 are additional text lines, and 22-3127 are multipart/mixed segments, which contain three child segments: 22-171 and multipart/related segments, rows 173-1688 and 1690-3125 are two attachments. The multipart/related segment contains two sub-segments: Line 27-61 is the multipart/alternative segment, and line 63-169 is an embedded Resource (image ). The multipart/alternative segment contains two sub-segments: 31-48 lines are plain text and 40-59 lines are hypertext texts.

Example 1 contains only plain text, which is actually a special case in the multipart Hierarchy Diagram. If you have to avoid simplicity, it will be complicated. The following form is fully in line with the mime spirit.

Date: Thu, 18 Apr 2002 09:32:45 + 0800 from: <bhw98@sina.com> to: <bhwang@jlonline.com> subject: testmime-version: 1.0content-type: multipart/alternative; boundary = "{[(^_^)]}" -- {[(^_^)]} Content-Type: text/plain; charset = "iso-8859-1" content-transfer-encoding: 7bit
This is a simple mail.
-- {[(^_^)]} --


Q What are some common domains in the mail header and field header?

A has many domain names that are used in the mail header from RFC 822, and some mime content is added. Common standard domain names and meanings are as follows:

Domain Name Description Add
Received Transfer Path Email servers at all levels
Return-Path Reply address Target email server
Delivered- Sending address Target email server
Reply- Reply address Email creator
From Sender address Email creator
To Recipient address Email creator
CC CC address Email creator
BCC Dark delivery address Email creator
Date Date and Time Email creator
Subject Topic Email creator
Message-ID Message ID Email creator
Mime-version MIME Version Email creator
Content-Type Content type Email creator
Content-transfer-Encoding Content Transmission Encoding Method Email creator

Non-standard and custom domain names all start with X-, such as X-mailer and X-msmail-priority. They can be understood only when the same program receives and sends emails.

In the field header, there are roughly the following fields:

Domain Name Description
Content-Type Segment type
Content-transfer-Encoding Transfer Encoding Method of CIDR blocks
Content-Disposition CIDR Block Arrangement
Content-ID Segment ID
Content-location Segment position (PATH)
Content-Base Base position of the CIDR Block

In addition to values, some fields also contain parameters. Values and parameters are separated. The parameter names and values are separated by "=. For example, in line 28-29 of Example 3, the value of the Content-Type field is "multipart/alternative", and the value of boundary is "---- = _ nextpart_002_007c_01c3115f.80dfc5e0 ". ".

Q Content-Type and their parameters?

A Content-Type is in the form of "primary type/subtype. The main types include text, image, audio, video, application, multipart, and message, which respectively represent text, image, audio, video, application, segmentation, and message. Each primary type may have multiple child types, such as text, plain, HTML, XML, CSS, and other child types. The primary and subtypes starting with X-also indicate custom types. They are not officially registered with iana, but most of them have been agreed to be vulgar. For example, application/X-zip-compressed is a zip file. In Windows, "hkey_classes_root \ mime \ database \ content type" in the registry lists most known content-types except multipart.

There are many supplementary provisions in the RFC regarding the form of parameters. Some may include several parameters.

Primary Type Parameter Name Description
Text Charset Character Set
Image Name Name
Application Name Name
Multipart Boundary Boundary

The character set can also be seen in "hkey_classes_root \ mime \ database \ charset" in the Windows registry.

Q content-transfer-encoding? What are the features?

A content-transfer-encoding includes several types: base64, quoted-printable, 7bit, 8bit, and binary. 7bit is the default encoding method. The source code of the email was originally designed to be all printable ASCII codes. The text or data of non-ASCII code must be encoded in the required format, as shown in the preceding three examples. Base64, quoted-printable is the most widely used encoding method in non-English countries. The binary method is symbolic without any practical value.

Base64 encodes the input string or segment of data into {'a'-'Z', 'a'-'Z', '0'-'9 ', '+', '/'} is a 64-character string, '=' is used for filling. The encoding method is to take 6 bits for each input data stream, use the 6 bits value (0-63) as the index to perform table search, and output the corresponding characters. In this way, each 3 bytes is encoded as 4 characters (3 × 8 → 4 × 6); the less than 4 characters are filled with '=. In some cases? Charset? B? XXXXXXXX? = "Indicates that XXXXXXXX is base64 encoded and the character set of the original text is charset. Example 3: 7th rows "=? Gb2312? B? Wlbatrxezowncg ==? = "Is coded by the simplified Chinese" blue sky. In the segment body, the code is directly encoded. line feed is appropriate. It is recommended that each line of mime contains a maximum of 76 characters. For example, line 1697-3125 of step 3 is a base64-encoded ZIP file.

Quoted-printable is encoded based on the input string or byte range. If no encoding is required, it is output directly. If encoding is required, '=' is output first ', followed by the hexadecimal byte value expressed in 2 characters. In some cases? Charset? Q? XXXXXXXX? = "Indicates that XXXXXXXX is quoted-printable encoding, and the character set of the original text is charset. In the segment body, the code is directly encoded, and the line feed is appropriate. An extra '=' is output before the line feed '. For example, line 4-59 of step 3 is the quoted-printable encoding of HTML text. In the first line, the original "= C7 = E7 = C0 = Ca" is "clear", because the gb2312 code of "clear" is c7e7, And the gb2312 code of "Lang" is c0ca. At the end of rows 48th, 53, and 57, there is only a solitary '=', indicating that this is a soft carriage return caused by encoding, not inherent in the original text.

In recent years, most mail servers in China have supported the 8-bit method. Therefore, 8-bit encoding can be used only for mails transmitted in China, especially in the mail header, and Chinese characters are not processed. If the email is going abroad, it is still honestly encoded by base64 or quoted-printable.

Q What are embedded resources? What forms does it have?

A embedded resource is also a shining point of mime, which can make the mail content lively and colorful. You can define some segments of images, animations, sounds, and even CSS styles and scripts associated with the body in the multipart/related Framework of the email. Generally, hyperlinks are used in the HTML body to associate with embedded resources. For example, in Example 3, lines 53-54 of the html body are decoded

<Body background = CID: 007901c3111c $72b978a0 $ 03477f @ bluesky bgcolor = # ffffff>

It indicates that an image with the content-ID 007901c3111c $72b978a0 $ 000007f @ bluesky is used as the background (CID: XXXXXXXX is also a hyperlink ). Line is such an embedded resource.

In addition to using content-ID for contact, there is also a common form: using a normal Super connection and content-location. For example:

In the html body,

 
......  ...... ? Id = 486341 "> ............

The corresponding embedded resource is

Content-Type: image/GIF; name = "anti_joyo_dm_book.gif" content-transfer-encoding: base64Content-Location ............ content-Type: Application/octet-stream; name = "getimage_small.asp? Id = 486341 "content-transfer-encoding: base64Content-Location: http://www.dangdang.com/dd2001/getimage_small.asp? Id = 486341 ............

In addition, the following two rows are equivalent.

 
Content-location: http://www.dangdang.com/images/all/anti_joyo_dm_book.gif

And

 
Content-location: anti_joyo_dm_book.gif

Q How can I use attachments and embedded resources to spread mail viruses?

A. Some email attachments may be infected with viruses, which is easy to understand. After all, the attachment is a file, which can be prevented and cannot be opened easily. However, embedded resources are accessed when you browse the mail content.CodeAnd you are making moves without knowing it. For example, the Nimda virus that has been popular around the world in the past two years, the functional source code is as follows:

Mime-version: 1.0content-type: multipart/related; type = "multipart/alternative "; boundary = "===_ abc1234567890def _ ====" -- ===_ abc1234567890def _ === Content-Type: multipart/alternative; boundary = "===_ abc0987654321def _ ====" -- ===_ abc0987654321def _ === = Content-Type: text/html; charset = "iso-8859-1" content-transfer-encoding: 7bit <HTML> 

It embeds an executable file as a resource into a framework page, but declares that this executable code is of the waveform sound type. Then the machine is infected with the virus. Computers with viruses use the address book to send emails with viruses to others. The Nimda Worm is widely used.

Throughout history, the virus has just emerged, but none of them can survive. So does Nimda, and so does SARS. The saying goes: "how difficult it is to win the city," And the saying goes: "SARS will eventually fall, and the spirit of the city will survive forever". I believe we will soon Defeat SARS "!

The virus database is upgraded after the new virus ass. Do not rely too much on anti-virus software. A good habit is to disable the mail preview function, or set to preview the plain text part. First, check the mail source code and confirm that the virus is excluded before opening it. Especially for emails sent from strangers with hypertext texts. Never open attachments directly in the mail client.

Q How do I trace the sources of some spam by hiding the sender? A From the mail header domain name table, we can see that the mail creator can master most of the domain content, but the other domains such as received are automatically added by servers at all levels, and the sender is too long. Generally, spam is sent using a group sending software. The from domain (sender address) in the mail header can be forged at will, or even written as the recipient address (the recipient has received a spam email that has not been sent yet, angry ?). View the transferred ed domain (Transfer Path) chain to find the real source. The received statement added to each server is at the beginning of the mail. Therefore, the bottom of the received contains the SMTP or HTTP server used by the sender and the original external IP address of the gateway. The basic format of the receive statement is from a by B. A is the sender and B is the receiver. For example:
Received: (Qmail 45304 invoked from network); 4 May 2003 17:05:47-0000 received ed: from unknown (Helo bjapp9.163.net) (202.108.20.197) by 202.106.182.244 with SMTP; 4 May 2003 17:05:47-0000 modified Ed: from localhost (localhost [127.0.0.1]) by bjapp9.163.net (postfix) with smtp id e1c761d84c631 for <bhw98@sina.com>; Mon, 5 May 2003 01:07:26 + 0800 (CST) received: from fanyingxxxx@tom.com (unknown [211.99.162.194]) by bjapp9.163.net (coremail) with smtp id ogeaam1itt7mn1c. 1 For <bhw98@sina.com>; Mon, 05 May 2003 01:07:26 + 0800 (CST)

3. Net (coremail 202.108.20.197 ?) → Bjapp9.163.net (Postfix, 202.108.20.197 ?) → 202.106.182.244. The sender's mailbox fanyingxxxx@tom.com happens, but in most cases it may not be listed.

In this example, localhost [127.0.0.1] means that the mail service proxy software is installed on bjapp9.163.net.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.