Implementation of POP3 message decoding with PHP (ii) _php

Source: Internet
Author: User
Keywords mail implementation encoding MIME part representation body one decode
POP3

Introduction to MIME Encoding methods
(Author: Chen Junqing October 24, 2000 15:09)

Introduction to MIME Encoding methods

Subject: =?gb2312? b?xoo6w6oh?=

Here is the subject of the message, but because of the code, we do not see what the content, its original text is: "Hello!" "Let's look at the two methods of MIME encoding first."

The initial reason for encoding a message is because many gateways on the Internet are not able to properly transmit 8 bit-encoded characters, such as Chinese characters. The principle of encoding is to convert the contents of the 8 bit into a 7 bit form in order to transmit correctly, after receiving the receiver, and then restore it to 8 bit of content.

MIME is a "Multipurpose Internet Mail Extension Protocol" abbreviation, before the MIME protocol, the message encoding has been UUENCODE and other encoding methods, but because the MIME protocol algorithm is simple, and easy to expand, has now become the mainstream of the message encoding, not only for the transmission of 8 bit characters, but also can Used to transmit binary files, such as images, audio, and other information in email attachments, and to extend many MIME-based applications. In terms of encoding, MIME defines two encoding methods, Base64 and QP (quote-printable):

Base 64 is a common method, the principle is very simple, that is, three byte of data with 4 byte, so that four bytes, the actual use of only the front 6 bit, so there is no way to transfer only 7bit characters of the problem. The abbreviation for Base 64 is generally "B", as the subject in this letter is the BASE64 code.

Another method is the QP (quote-printable) method, which is usually abbreviated as the "Q" method, which is the principle of using a 8 bit character as a two 16 binary value, and then adding "=" in front. So we see that the QP encoded file usually looks like this: =B3=C2=BF=A1=C7=E5=A3=AC=C4=FA=BA=C3=A3=A1.

In PHP, the system has two functions that can be easily decoded: Base64_decode () and Quoted_printable_decode (), the former can be used for Base64 encoding decoding, the latter is used for QP encoding method decoding.

Now let's take a look at subject: =?gb2312? B?xoo6w6oh?= the content of this topic, this is not a complete coding, only part of the code, this part with =?? = two tags, =? What follows is that the character set of this text is GB2312, then one? One of the following B represents the BASE64 encoding used. With this analysis, let's take a look at this MIME decoding function: (This function is provided by phpx.com Webmaster Sadly, I put it into a class, and made a small number of changes, here to thank)

function Decode_mime ($string) {

$pos = Strpos ($string, ' =? ');

if (!is_int ($pos)) {

return $string;

}

$preceding = substr ($string, 0, $pos); Save any preceding text

$search = substr ($string, $pos +2); /* The MIME header spec says the longest a single encoded word can */

$d 1 = strpos ($search, '? ');

if (!is_int ($d 1)) {

return $string;

}

$charset = substr ($string, $pos +2, $d 1); Remove the definition part of a character set

$search = substr ($search, $d); The character set defines the later part and the $search;

$d 2 = Strpos ($search, '? ');

if (!is_int ($d 2)) {

return $string;

}

$encoding = substr ($search, 0, $d 2); Two? Part of the encoding between: Q or B

$search = substr ($search, $d 2+1);

$end = Strpos ($search, '? = '); $d between the 2+1 and the $end is encoded content:=> $endcoded _text;

if (!is_int ($end)) {

return $string;

}

$encoded _text = substr ($search, 0, $end);

$rest = substr ($string, (strlen ($preceding. $charset. $encoding. $encoded _text) +6)); +6 is the front removed =???? = six character (s)

Switch ($encoding) {

Case ' Q ':

Case ' Q ':

$encoded _text = Str_replace (' _ ', '%20 ', $encoded _text);

$encoded _text = str_replace (' = ', '% ', $encoded _text);

$decoded = UrlDecode ($encoded _text);

$decoded =quoted_printable_decode ($encoded _text);

if (Strtolower ($charset) = = ' windows-1251 ') {

$decoded = convert_cyr_string ($decoded, ' w ', ' K ');

}

Break

Case ' B ':

Case ' B ':

$decoded = Base64_decode ($encoded _text);

if (Strtolower ($charset) = = ' windows-1251 ') {

$decoded = convert_cyr_string ($decoded, ' w ', ' K ');

}

Break

Default

$decoded = ' =? '. $charset. '?' . $encoding. '?' . $encoded _text. '?=';

Break

}

Return $preceding. $decoded. $this->decode_mime ($rest);

}

This function uses a recursive method to implement a decoding of a character containing a Subject segment. Comments have been added to the program. People who believe a bit of PHP programming basics can see it. This function is also decoded by the invocation of the Base64_decode () and Quoted_printable_decode () two system functions, but requires a large number of string parsing of the message source file. However, PHP's string manipulation can be considered the most convenient and free in all languages. The last return $preceding of the function. $decoded. $this->decode_mime ($rest); Recursive decoding is implemented because this function is actually placed in a mime-decoded class to be introduced later, so the $this->decode_mime ($rest) is used in this form of invocation method.

Let's look at the text below. Here are some of the MIME's header information, we do a simple introduction (if the reader is interested in learning more about the content, please refer to the official MIME document).

mime-version:1.0

Indicates the version number of the MIME used, which is typically 1.0;

Content-type: Defines the type of body that we actually use to know what type of file is in the body, such as:

Text/plain represents an unformatted text body,
Text/html represents the HTML document,
Image/gif represents pictures in GIF format and so on.

In particular, the compound types that are commonly used in messages are described in this article. The multipart type indicates that the body is composed of multiple parts, followed by a subtype that describes the relationship between these parts, the three types of messages used in the message,

Multipart/alternative: Indicates that the body consists of two parts, and you can select either one. The main function is that when the essay has both text format and HTML format, you can select one of the two bodies to display, the HTML-enabled mail client software will generally display its HTML body, but not support will display its text text;

Multipart/mixed: Indicates that multiple parts of a document are mixed, referring to the relationship of the body to the attachment. If the MIME type of the message is

Multipart/mixed, which means that the message has an attachment;
Multipart/related: Indicates that multiple parts of a document are relevant and are generally used to describe the Html body and its associated images.

These composite types can also be nested, such as a message with an attachment, and a text in HTML and text two format, the structure of the message is:

Content-type:multipart/mixed

Part One:

Content type:multipart/alternative:

the text text;

Text in Html format

Part Two:

Attachment

Message Terminator;

Because a composite type consists of multiple parts, a delimiter is required to separate the sections, which is described in the boundary= "----=_nextpart_000_0007_01c03166.5b1e9510" in the message source file above. For each contect type:multipart/* content, there will be a description, representing the separation between multiple parts, this delimiter is not possible in the body of a string of antiquity character combinations, in the document, with "--" plus this boundary to indicate the beginning of a part , at the end of the document, with "--" plus boundary and finally "--" to indicate the end of the document. Because composite types can be nested, multiple boundary can be used in messages.

There is also one of the most important MIME header tags:

Content-transfer-encoding:base64 it represents the encoding of this part of the document, which is the Base64 or QP (quote-printable) described above. Only by identifying this description can we decode it using the correct decoding method.

Confined to space, the introduction to MIME is only here. I'll give you a class that decodes the MIME message and gives a brief description of it.
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.