Mime encoding (base64) and Its Significance

Source: Internet
Author: User
Tags decode all rfc

1. Mime: Multipurpose Internet Mail Extensions
The English Empire University's online computer dictionary foldoc interprets mime as: "multi-part, multimedia e-mail, and WWW hypertext encoding standard, it is used to transmit non-text data such as shapes, sounds, and faxes. Mime is defined in rfc1341. It uses the mimencode method to convert binary data into a combination of characters called the ASCII subset of base64 ."
Mime-specific newsgroups are available on the Internet: Comp. Mail. mime. The FAQ of this newsgroup can be obtained from the following outlets:
Http://www.cis.ohio-state.edu/hypertext/faq/usenet/mail/mime-faq/mime0/faq.html
Mimencode was first called mmencode. It is proposed to replace uencode with mimencode because uencode uses some characters to cause transmission barriers in some email gateways (especially those that convert ASCII and ebcdic codes, (some software cannot correctly decode all uencode algorithms, leading to difficulty in reading emails). Therefore, mime is designed to replace uencode, but the result is that these protocols coexist.
Before mime is introduced, RFC 822 can only send basic ASCII code text information. It is very difficult to implement the content of an email, such as binary files, sounds, and animations.
Mime provides a method that can append multiple encoding files to an email to make up for the shortcomings of the original information format. In fact, mime is not only a part of the HTTP protocol standard.
II. Introduction to mime encoding methods
The initial cause of email encoding is that many gateways on the Internet cannot correctly transmit 8-bit characters, such as Chinese characters. The encoding principle is to convert 8-bit content into a 7-bit format for correct transmission. After receiving the content, the receiver restores it to 8-bit content.
Prior to the mime protocol, the message encoding method was uuencode. However, because the mime protocol algorithm is simple and easy to expand, it has become the mainstream of the mail encoding method, it is used not only to transmit 8-bit characters, but also to transmit binary files, such as images and audios in email attachments. It also extends many mime-based applications. In terms of the encoding method, mime defines two encoding methods: base64 and qP (Quote-printable ).
1. base64 encoding
Base64 is a common method. Its principle is very simple, that is, the data of three bytes is represented by four bytes. In these four bytes, only the first 6 bits are actually used, so there is no problem that only 7 bits can be transmitted. The abbreviation of base64 is generally "B ".
Base64 encodes the input string or segment of data into {'a'-'Z', 'a'-'Z', '0'-'9 ', '+', '/'} is a 64-character string, '=' is used for filling.
The encoding method is to take 6 bits from the input data stream each time, use the value of 6 bits (0-63) as the index to query tables, and output the corresponding characters.
In this way, each 3 bytes is encoded as 4 characters (3 × 8 → 4 × 6); the less than 4 characters are filled with '=.
In some cases? Charset? B? XXXXXXXX? = "Indicates that XXXXXXXX is base64 encoded and the character set of the original text is charset. In the segment body, the code is directly encoded. line feed is appropriate. It is recommended that each line of mime contains a maximum of 76 characters.
The base64 algorithm is very simple. It puts the sequence of the SWAp stream into a 24-bit buffer, where the missing characters are supplemented by zero.
Then, the buffer is truncated into four parts, with the first position in the upper position. Each part has 6 digits and is re-represented with 64 characters. If the input contains only one or two bytes, the output is supplemented with the equal sign "=. This can block the additional information and cause confusion in encoding.
2. QP Encoding
Another method is the quote-printable (qP) method, which is abbreviated as "Q". The principle is to use two hexadecimal values to represent an 8-bit character, then add "=" to the front ". So we can see that the QP-encoded file is usually like this: = B3 = C2 = BF = A1 = C7 = E5 = a3 = ac = C4 = fa = BA = C3 = a3 = A1.
Quoted-printable is encoded based on the input string or byte range. If it is a character that does not need to be encoded, It is output directly. If encoding is required, '=' is output first, followed by the hexadecimal byte value expressed in 2 characters. In some cases? Charset? Q? XXXXXXXX? = "Indicates that XXXXXXXX is quoted-printable encoding, and the character set of the original text is charset. In the segment body, the code is directly encoded, and the line feed is appropriate. An extra '=' is output before the line feed '.

3. Mime header information
Email header
In the mail header, many domain names are used from RFC 822, and Mime is also added. Common standard domain names and meanings are as follows:
Domain Name Description Added
Received transmission path: email servers at all levels
Return-path reply address target email server
Delivered-to destination email server
Reply-to reply Address Email creator
From sender address: email creator
To recipient Address Email creator
CC Address Email creator
The creator of the BCC dark mail.
Date and Time email creator
Subject subject email creator
Message-ID: the creator of the message.
Mime-version: the creator of the mime-version message.
The creator of the Content-Type email.
Content-transfer-encoding: the sender of the Transfer Encoding Method email
Non-standard and custom domain names all start with X-, such as X-mailer and X-msmail-priority. They can be understood only when the same program receives and sends emails.
Field Header
In the field header, there are roughly the following fields:
Domain Name meaning
Content-Type
Transfer Encoding Method of content-transfer-Encoding
Arrangement of content-Disposition segments
Content-ID segment ID
Content-location (PATH)
Base position of the content-base segment
In addition to values, some fields also contain parameters. Values and parameters are separated. The parameter names and values are separated by "=.
1. mime-version
Indicates the MIME Version used, which is generally 1.0;
For example:
Mime-type: 1.0
2. Content-Type
Content-Type defines the type of the body. We actually use this identifier to know what type of file is in the body. For example, text/plain indicates the unformatted text body, text/html indicates the HTML document, and image/GIF indicates the GIF image. Content-Type is in the form of "primary type/subtype. The main types include text, image, audio, video, application, multipart, and message, which respectively represent text, image, audio, video, application, segmentation, and message. Each primary type may have multiple child types, such as text, plain, HTML, XML, CSS, and other child types. The primary and subtypes starting with X-also indicate custom types. They are not officially registered with iana, but most of them have been agreed to be vulgar. For example, application/X-zip-compressed is a zip file. In Windows, "hkey_classes_root/MIME/database/content type" in the registry lists most known content-types except multipart.
There are many supplementary provisions in the RFC regarding the form of parameters. Some may include several parameters, which are more common:
Main Type parameter name meaning
Text charset Character Set
Image name
Application name
Multipart boundary Boundary
Multipart type
The common compound type used in Emails: multipart.
The multipart type indicates that the body is composed of multiple parts. The subtypes below indicate the relationships between these parts.

The three types used in emails are:
(1). multipart/alternative: indicates that the body is composed of two parts. You can select either of them. The main function is to select one of the two texts for display when both the text format and the HTML format are available. The mail client software that supports the HTML format usually displays the html body, otherwise, the text is displayed;
(2). multipart/mixed: indicates that multiple parts of the document are mixed, indicating the relationship between the body and the attachment. If the MIME type of a mail is multipart/mixed, it indicates that the mail carries an attachment.
(3). multipart/related: indicates that multiple parts of the document are related and are generally used to describe the images related to the html body.

Multipart is the essence of mime mail. The body is divided into multiple segments, each of which contains two parts: the header and the body. These two parts are also separated by blank lines. The hierarchical relationships between them can be summarized as follows:
+ ------------------------- Multipart/mixed ---------------------------- +
|
| + --------------- Multipart/related ------------------ + |
|
| + ----- Multipart/alternative ------ ++ ---------- + | + ------ + |
| Embedded resources | attachment |
| + ------------ ++ ------------ + | + ---------- + | + ------ + |
| Plain text body | hypertext body |
| + ------------ ++ ------------ + | + ---------- + | + ------ + |
| Embedded resources | attachment |
| + ---------------------------------- ++ ---------- + | + ------ + |
|
| + ------------------------------------------------------ + |
|
+ ---------------------------------------------------------------------- +

We can see that if you want to add attachments to an email, you must define the multipart/mixed segment. If there are embedded resources, you must at least define the multipart/related segment. If the plain text and hypertext coexist, define at least multipart/alternative segments. What is "at least "? For example, if only plain text and hypertext text are available, the type in the mail header is extended and defined as multipart/related or even multipart/mixed.
The common feature of multipart types is to specify the "boundary" parameter string in the field header, and each sub-segment in the segment body is bounded by this string. All child segments start with the "--" + boundary row, and the parent segment ends with the "--" + boundary + "--" Row. Segments and segments are also separated by blank lines. When the mail body is multipart, the start part of the mail body (before the first "--" + boundary line) can have some additional text lines, which are equivalent to comments, ignore when decoding. Some additional lines of text can also be added between segments and will not be displayed.
These composite types can be nested. For example, if an email with attachments and body in HTML and text formats exist, the mail structure is:
Content-Type: multipart/mixed
Part 1:
Content type: multipart/alternative:
Text text;
Body in HTML Format
Part 2:
Attachment
Email Terminator;
Because the composite type consists of multiple parts, a separator is required to separate these parts. This is described in the boundary in the preceding mail source file. For each contect type: the content of multipart/* has such a description, indicating the separation of multiple parts.
For mime/base64 encoded emails, you can view the source code of the message, which generally contains sentences such as "this is a multi-part message in MIME format. It can also be decoded by most email programs, including Netscape, Ms mail, and Eudora. These programs can correctly identify the body of the email and restore the mime/base64 encoded part to the correct text or binary files.
3. Content-transfer-Encoding
It indicates the encoding method of this part of the document. Only by recognizing this description can we use the correct decoding method to decode it.
Content-transfer-encoding includes base64, quoted-printable, 7bit, 8bit, and binary.
7bit is the default encoding method. The source code of the email was originally designed to be all printable ASCII codes.
Non-ASCII text or data must be encoded into the required format.
Base64, quoted-printable is the most widely used encoding method in non-English countries.
The binary method is symbolic without any practical value.
4. Boundary
This Delimiter is a combination of strings of ancient characters that cannot appear in the body. In the document, "--" And this boundary are used to indicate the beginning of a part. At the end of the document, add "--" and boundary "--" at the end to indicate the end of the document. Because composite types can be nested, there may be multiple boundary in the mail.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.