Use PHP to decode POP3 emails (2)

Source: Internet
Author: User
MIME encoding method introduction Subject :? Gb2312? B? XOO6w6Oh? This is the subject of the email, but we cannot see what it is because of encoding. The original text is: Hello! Let's first look at the two MIME encoding methods. The initial reason for encoding is that many gateways on the Internet cannot correctly transmit 8-bit POP3 messages.

MIME encoding methods


MIME encoding methods

Subject: =? Gb2312? B? XOO6w6Oh? =

This is the subject of the email, but we cannot see what it is because of encoding. The original text is: "Hello !" Let's first look at the two MIME encoding methods.

The initial cause of email encoding is that many gateways on the Internet cannot correctly transmit 8-bit characters, such as Chinese characters. The encoding principle is to convert 8-bit content into a 7-bit format for correct transmission. after receiving the content, the receiver restores it to 8-bit content.

MIME is the abbreviation of "multi-purpose Internet mail extended Protocol". before the MIME protocol, the encoding method of the mail was uencode and so on. However, the MIME protocol algorithm is simple and easy to expand, nowadays, it has become the mainstream of mail encoding methods. it is used not only to transmit 8-bit characters, but also to transmit binary files, such as images and audios in the email attachments, it also expands many MIME-based applications. In terms of the encoding method, MIME defines two encoding methods: Base64 and QP (Quote-Printable ):

Base 64 is a general method, and its principle is very simple. it is to use four bytes to represent the data of three bytes. in this way, in the four bytes, actually, only the first 6 bits are used, so there is no problem that only 7 bits can be transmitted. The abbreviation of Base 64 is generally "B", as the Subject in this letter uses Base64 encoding.

Another method is the Quote-Printable (QP) method, which is abbreviated as "Q". The principle is to use two hexadecimal values to represent an 8-bit character, then add "=" to the front ". So we can see that the QP-encoded file is usually like this: = B3 = C2 = BF = A1 = C7 = E5 = A3 = AC = C4 = FA = BA = C3 = A3 = A1.

In PHP, the system has two functions that can be easily decoded: base64_decode () and quoted_printable_decode (). The former can be used for base64 encoding decoding, the latter is used for decoding the QP encoding method.

Now let's take a look at Subject: =? Gb2312? B? XOO6w6Oh? = The content of this topic, which is not a complete encoding, is only partially encoded. this part uses =? ? = Two tags are enclosed, =? The character set of this text is GB2312, and then? The following B indicates the Base64 encoding. Through this analysis, let's look at the MIME-decoded function: (This function is composed of PHPX. COM Webmaster Sadly provided, I put it into a class, and made a small number of modifications, here thank you)

Function decode_mime ($ string ){

$ Pos = strpos ($ string, '=? ');

If (! Is_int ($ pos )){

Return $ string;

}

$ Preceding = substr ($ string, 0, $ pos); // save any preceding text

$ Search = substr ($ string, $ pos + 2);/* the mime header spec says this is the longest a single encoded word can be */

$ D1 = strpos ($ search ,'? ');

If (! Is_int ($ d1 )){

Return $ string;

}

$ Charset = substr ($ string, $ pos + 2, $ d1); // retrieves the definition part of the character set.

$ Search = substr ($ search, $ d1 + 1); // The part after the character set is defined >$ search;

$ D2 = strpos ($ search ,'? ');

If (! Is_int ($ d2 )){

Return $ string;

}

$ Encoding = substr ($ search, 0, $ d2); // Two? Part of the encoding method: q or B

$ Search = substr ($ search, $ d2 + 1 );

$ End = strpos ($ search ,'? = '); // $ D2 + 1 and $ end are encoded content: =>$ endcoded_text;

If (! Is_int ($ end )){

Return $ string;

}

$ Encoded_text = substr ($ search, 0, $ end );

$ Rest = substr ($ string, (strlen ($ preceding. $ charset. $ encoding. $ encoded_text) + 6); // + 6 is the previously removed = ???? = Six characters

Switch ($ encoding ){

Case 'Q ':

Case 'Q ':

// $ Encoded_text = str_replace ('_', '% 20', $ encoded_text );

// $ Encoded_text = str_replace ('=', '%', $ encoded_text );

// $ Decoded = urldecode ($ encoded_text );

$ Decoded = quoted_printable_decode ($ encoded_text );

If (strtolower ($ charset) = 'windows-1251 '){

$ Decoded = convert_cyr_string ($ decoded, 'W', 'k ');

}

Break;

Case 'B ':

Case 'B ':

$ Decoded = base64_decode ($ encoded_text );

If (strtolower ($ charset) = 'windows-1251 '){

$ Decoded = convert_cyr_string ($ decoded, 'W', 'k ');

}

Break;

Default:

$ Decoded = '=? '. $ Charset .'? '. $ Encoding .'? '. $ Encoded_text .'? = ';

Break;

}

Return $ preceding. $ decoded. $ this-> decode_mime ($ rest );

}

This function uses a recursive method to decode a section of characters that contain Subject segments. Annotations have been added to the program. I believe that PHP programmers can understand the basics. This function is also called to decode base64_decode () and quoted_printable_decode (). However, you need to analyze a large number of strings in the mail source file. However, PHP string operations are the most convenient and free in all languages. Return $ preceding. $ decoded. $ this-> decode_mime ($ rest); implements recursive decoding, because this function is actually placed in a MIME decoding class to be introduced later, therefore, the call method in the form of $ this-> decode_mime ($ rest) is used.

Let's look at the text below. Here is a brief introduction to some MIME header information. (For more information, see the official MIME documentation ).

MIME-type: 1.0

Indicates the MIME version used, which is generally 1.0;

Content-Type: defines the Type of the body. we actually use this identifier to know what Type of file is in the body. for example, text/plain indicates the unformatted text body, text/html: Html documents. image/gif: Images in gif format. In this article, it is particularly important to note the composite types commonly used in emails. The multipart type indicates that the body is composed of multiple parts. The subtypes below indicate the relationship between these parts. The three types used in the mail are multipart/alternative: the body is composed of two parts. you can select either of them. The main function is to select one of the two texts for display when both the text format and the html format are available. the mail client software that supports the html format usually displays the HTML body, if not, the Text body is displayed. multipart/mixed indicates that multiple parts of the document are mixed, indicating the relationship between the body and the attachment. If the MIME type of an email is multipart/mixed, it indicates that the email carries an attachment; multipart/related: it indicates that multiple parts of the document are related and are generally used to describe the images related to the Html body.

These composite types can be nested. for example, if an email with attachments and body in html and text formats exist, the Mail structure is:

Content-Type: multipart/mixed

Part 1:

Content Type: multipart/alternative:

Text;

Body in Html format

Part 2:

Attachment

Email Terminator;

Because the composite type consists of multiple parts, a separator is required to separate these parts. this is described in the preceding Mail source file boundary = "---- = _ nextpart_000_0007_01c01_6.5b1e9510, for each Contect type: multipart/* content, there will be such a description, indicating the separation between multiple parts. this delimiter is a combination of a string of ancient characters that cannot appear in the body, in the document, add the boundary to "--" to indicate the beginning of a part. at the end of the document, add "--" and boundary "--" at the end to indicate the end of the document. Because composite types can be nested, there may be multiple boundary in the mail.

There is also one of the most important MIME header labels:

Content-Transfer-Encoding: base64 indicates the Encoding method of this part of the document, that is, the Base64 or QP (Quote-Printable) we mentioned above ). Only by recognizing this description can we use the correct decoding method to decode it.

Only the introduction of MIME is mentioned here. Next I will give a class for decoding MIME mail and give a brief description of it.

Author: Chen Junqing
Reprinted: chinacnet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.