MIME decoding class a class that implements MIME decoding this class implements the decoding method is decode ($ headnull, $ bodynull, $ content_num-1), in order to facilitate processing, the input is a two-character array. in the previous article, the POP class used will receive two such arrays. one is the mail header content, one is the mail body POP3
MIME decoding class
A class that implements MIME decoding
Decode ($ head = null, $ body = null, $ content_num =-1) is used to decode this class. for processing convenience, you must enter a two-character array, in our previous article, the POP class used is charged with two arrays, one being the mail header content and the other being the mail body content. I will not describe it in detail due to the limited length. its implementation idea is similar to the POP class described in the previous article. See the annotations.
This class uses a large number of regular expression operations. if you are not familiar with this operation, please refer to the relevant information about regular expressions.
Class decode_mail
{
Var $ from_name; var $ to_name; var $ mail_time; var $ from_mail; var $ to_mail;
Var $ reply_to; var $ cc_to; var $ subject;
// The decoded mail header information:
Var $ body;
// The decoded body data is an array.
Var $ body_type; // Body Type
Var $ tem_num = 0;
Var $ get_content_num = 0;
Var $ body_temp = array ();
Var $ body_code_type;
Var $ boundary;
// The above are some global temporary variables used in some methods. because PHP cannot be well encapsulated, it can only be defined here
Var $ err_str; // error message
Var $ debug = 0; // debug tag
Var $ month_num = array ("Jan" => 1, "Feb" => 2, "Mar" => 3, "Apr" => 4, "May" => 5, "Jun" => 6, "Jul" => 7,
"Aug" => 8, "Sep" => 9, "Oct" => 10, "Nov" => 11, "Dec" => 12 ); // converts an English month to a month in number.
Function decode ($ head = null, $ body = null, $ content_num =-1) // calls the main method. $ head and $ body are two arrays, $ content_num indicates that when the body contains multiple parts, only the content of the specified part is retrieved to improve efficiency. the default value is-1, indicating that all content is decoded. if the decoding is successful, this method returns true.
{
If (! $ Head and! $ Body)
{
$ This-> err_str = "no email header or content is specified !! ";
Return false;
}
If (gettype ($ head) = "array ")
{
$ Have_decode = true;
$ This-> decode_head ($ head );
}
If (gettype ($ body) = "array ")
{
$ This-> get_content_num = $ content_num;
$ This-> body_temp = $ body;
$ Have_decode = true;
$ This-> decode_body ();
Unset ($ this-> body_temp );
}
If (! $ Have_decode)
{
$ This-> err_str = "the passed parameter is incorrect. usage: both new decode_mail (head, body) parameters are arrays ";
Return false;
}
}
Function decode_head ($ head) // decodes the header content to retrieve meaningful content from the header.
{
$ I = 0;
$ This-> from_name = $ this-> to_name = $ this-> mail_time = $ this-> from_mail = $ this->
To_mail = $ this-> reply_to = $ this-> cc_to = $ this-> subject = "";
$ This-> body_type = $ Sthis-> boundary = $ this-> body_code_type = "";
While ($ head [$ I])
{
If (strpos ($ head [$ I], "=? "))
$ Head [$ I] = $ this-> decode_mime ($ head [$ I]); // decodes the encoded content, the decoding function is decode_mime () described above ()
$ Pos = strpos ($ head [$ I], ":");
$ Summ = substr ($ head [$ I], 0, $ pos );
$ Content = substr ($ head [$ I], $ pos + 1); // separate the ID of the mail header information from the content
If ($ this-> debug) echo $ summ. ": ----:". $ content ."
";
Switch (strtoupper ($ summ ))
{
Case "FROM": // sender's address and name (there may be no name, only address information)
If ($ left_tag_pos = strpos ($ content, "<"))
{
$ Mail_lenth = strrpos ($ content, ">")-$ left_tag_pos-1;
$ This-> from_name = substr ($ content, 0, $ left_tag_pos );
$ This-> from_mail = substr ($ content, $ left_tag_pos + 1, $ mail_lenth );
If (trim ($ this-> from_name) = "") $ this-> from_name = $ this-> from_mail;
Else
If (ereg ("[" | '] ([^' "] +) ['|"] ", $ this-> from_name, $ reg ))
$ This-> from_name = $ reg [1];
}
Else
{
$ This-> from_name = $ content;
$ This-> from_mail = $ content;
// No sender's email address
}
Break;
Case "TO": // recipient's address and name (maybe no name)
If ($ left_tag_pos = strpos ($ content, "<"))
{
$ Mail_lenth = strrpos ($ content, ">")-$ left_tag_pos-1;
$ This-> to_name = substr ($ content, 0, $ left_tag_pos );
$ This-> to_mail = substr ($ content, $ left_tag_pos + 1, $ mail_lenth );
If (trim ($ this-> to_name) = "") $ this-> to_name = $ this-> to_mail;
Else
If (ereg ("[" | '] ([^' "] +) ['|"] ", $ this-> to_name, $ reg ))
$ This-> to_name = $ reg [1];
}
Else
{
$ This-> to_name = $ content;
$ This-> to_mail = $ content;
// No separate recipient email addresses
}
Break;
Case "DATE": // sending date. for processing convenience, a Unix timestamp is returned. you can use DATE ("Y-m-d", $ this-> mail_time) to obtain the date in the normal format.
$ Content = trim ($ content );
$ Day = strtok ($ content ,"");
$ Day = substr ($ day, 0, strlen ($ day)-1 );
$ Date = strtok ("");
$ Month = $ this-> month_num [strtok ("")];
$ Year = strtok ("");
$ Time = strtok ("");
$ Time = split (":", $ time );
$ This-> mail_time = mktime ($ time [0], $ time [1], $ time [2], $ month, $ date, $ year );
Break;
Case "SUBJECT": // email SUBJECT
$ This-> subject = $ content;
Break;
Case "REPLY_TO": // reply address (may not exist)
If (ereg ("<([^>] +)>", $ content, $ reg ))
$ This-> reply_to = $ reg [1];
Else $ this-> reply_to = $ content;
Break;
Case "CONTENT-TYPE": // the Content TYPE of the entire email, eregi ("([^;] *);", $ content, $ reg );
$ This-> body_type = trim ($ reg [1]);
If (eregi ("multipart", $ content) // if the multipart type is used, obtain the delimiter
{
While (! Eregi ('boundary = "(. *)" ', $ head [$ I], $ reg) and $ head [$ I])
$ I ++;
$ This-> boundary = $ reg [1];
}
Else // for general body types, directly obtain the encoding method
{
While (! Eregi ("charset = [" | '] (. *) [' | "]", $ head [$ I], $ reg ))
$ I ++;
$ This-> body_char_set = $ reg [1];
While (! Eregi ("Content-Transfer-Encoding :(. *)", $ head [$ I], $ reg ))
$ I ++;
$ This-> body_code_type = trim ($ reg [1]);
}
Break;
Case "CC": // CC ..
If (ereg ("<([^>] +)>", $ content, $ reg ))
$ This-> cc_to = $ reg [1];
Else
$ This-> cc_to = $ content;
Default:
Break;
} // End switch
$ I ++;
} // End while
If (trim ($ this-> reply_to) = "") // if no reply address is specified, the reply address is the sender address
$ This-> reply_to = $ this-> from_mail;
} // End function define
Function decode_body () // decodes the body, and uses a lot of information obtained from the email header decoding.
{
$ I = 0;
If (! Eregi ("multipart", $ this-> body_type) // if it is not a composite type, you can directly decode it.
{
$ Tem_body = implode ($ this-> body_temp, "rn ");
Switch (strtolower ($ this-> body_code_type) // body_code_type, the encoding method of the body, obtained from the mail header information
{Case "base64 ":
$ Tem_body = base64_decode ($ tem_body );
Break;
Case "quoted-printable ":
$ Tem_body = quoted_printable_decode ($ tem_body );
Break;
}
$ This-> tem_num = 0;
$ This-> body = array ();
$ This-> body [$ this-> tem_num] [content_id] = "";
$ This-> body [$ this-> tem_num] [type] = $ this-> body_type;
Switch (strtolower ($ this-> body_type ))
{
Case "text/html ":
$ This-> body [$ this-> tem_num] [name] = "hypertext body ";
Break;
Case "text/plain ":
$ This-> body [$ this-> tem_num] [name] = "text body ";
Break;
Default:
$ This-> body [$ this-> tem_num] [name] = "unknown body ";
}
$ This-> body [$ this-> tem_num] [size] = strlen ($ tem_body );
$ This-> body [$ this-> tem_num] [content] = $ tem_body;
Unset ($ tem_body );
}
Else // for composite type
{
$ This-> body = array ();
$ This-> tem_num = 0;
$ This-> decode_mult ($ this-> body_type, $ this-> boundary, 0); // call the compound type decoding method
}
}
Function decode_mult ($ type, $ boundary, $ begin_row) // This method uses a recursive method to decode the composite mail body. The Mail source file is taken from the body_temp array, the types and delimiters of the composite type and the start pointer in the body_temp array are provided during the call.
{
$ I = $ begin_row;
$ Lines = count ($ this-> body_temp );
While ($ I <$ lines) // This is the end id of a part;
{
While (! Eregi ($ boundary, $ this-> body_temp [$ I]) // locate a start ID
$ I ++;
If (eregi ($ boundary. "--", $ this-> body_temp [$ I])
{
Return $ I;
}
While (! Eregi ("Content-Type :( [^;] *);", $ this-> body_temp [$ I], $ reg) and $ this-> body_temp [$ I])
$ I ++;
$ Sub_type = trim ($ reg [1]); // The type of the obtained part is milt or text ....
If (eregi ("multipart", $ sub_type) // this sub-part has multiple parts;
{
While (! Eregi ('boundary = "([^"] *) "', $ this-> body_temp [$ I], $ reg) and $ this-> body_temp [$ I])
$ I ++;
$ Sub_boundary = $ reg [1]; // delimiter of sub-division;
$ I ++;
$ Last_row = $ this-> decode_mult ($ sub_type, $ sub_boundary, $ I );
$ I = $ last_row;
}
Else
{
$ Comm = "";
While (trim ($ this-> body_temp [$ I])! = "")
{
If (strpos ($ this-> body_temp [$ I], "=? "))
$ This-> body_temp [$ I] = $ this-> decode_mime ($ this-> body_temp [$ I]);
If (eregi ("Content-Transfer-Encoding :(. *)", $ this-> body_temp [$ I], $ reg ))
$ Code_type = strtolower (trim ($ reg [1]); // encoding method
$ Comm. = $ this-> body_temp [$ I]. "rn ";
$ I ++;
} // Comm is the encoding description
If (eregi ('name = ["] ([^"] *) ["] ', $ comm, $ reg ))
$ Name = $ reg [1];
If (eregi ("Content-Disposition :(. *);", $ comm, $ reg ))
$ Disp = $ reg [1];
If (eregi ("charset = [" | '] (. *) [' | "]", $ comm, $ reg ))
$ Char_set = $ reg [1];
If (eregi ("Content-ID: [] * <(. *)>", $ comm, $ reg) // The Image identifier.
$ Content_id = $ reg [1];
$ This-> body [$ this-> tem_num] [type] = $ sub_type;
$ This-> body [$ this-> tem_num] [content_id] = $ content_id;
$ This-> body [$ this-> tem_num] [char_set] = $ char_set;
If ($ name)
$ This-> body [$ this-> tem_num] [name] = $ name;
Else
Switch (strtolower ($ sub_type ))
{
Case "text/html ":
$ This-> body [$ this-> tem_num] [name] = "hypertext body ";
Break;
Case "text/plain ":
$ This-> body [$ this-> tem_num] [name] = "text body ";
Break;
Default:
$ This-> body [$ this-> tem_num] [name] = "unknown body ";
}
// The next row begins to retrieve the body
If ($ this-> get_content_num =-1 or $ this-> get_content_num = $ this-> tem_num) // determine whether this part is required. -1 indicates all
{
$ Content = "";
While (! Ereg ($ boundary, $ this-> body_temp [$ I])
{
// $ Content [] = $ this-> body_temp [$ I];
$ Content. = $ this-> body_temp [$ I]. "rn ";
$ I ++;
}
// $ Content = implode ("rn", $ content );
Switch ($ code_type)
{
Case "base64 ":
$ Content = base64_decode ($ content );
Break;
Case "quoted-printable ":
$ Content = str_replace ("n", "rn", quoted_printable_decode ($ content ));
Break;
}
$ This-> body [$ this-> tem_num] [size] = strlen ($ content );
$ This-> body [$ this-> tem_num] [content] = $ content;
}
Else
{
While (! Ereg ($ boundary, $ this-> body_temp [$ I])
$ I ++;
}
$ This-> tem_num ++;
}
// End else
} // End while;
} // End function define
Function decode_mime ($ string ){
// Decode_mime is provided in the preceding section. it is skipped here.
}
} // End class define
Here, we must note that the decoding of the image used in the html text is particularly important. When you send an html text, you will encounter the problem of how to transfer the image. An image is a tag in an html document. The key is the source file. Many mail processing methods use an absolute url ID, that is, using tags such as in the html body of the mail. in this way, when reading the mail, the email reader (usually using an embedded browser) automatically downloads images from the Internet. However, if the connection to the Internet is disconnected after the email is received, the images cannot be displayed normally.
Therefore, a better way is to put the image in an email and send it out. In MIME encoding, besides the multipart/related MIME header information mentioned above, a Content-ID is used to describe the relationship between the image and the body: to establish a relationship between the image and the html body. When an image in an html document is encoded, an attribute such as Content-ID: 122223443556dsdf @ ntsever is added to its MIME header. 122223443556dsdf @ ntsever is a unique identifier. in the html document, when decoding a tag, you also need to modify the tags in the html body to point to the specific path of the decoded image. However, the labels in the hmtl body are not modified in this decoding class because the specific decoding program has different processing operations on the image. Therefore, when using this class, some processing is required for html text with images. Images in the body can be saved using temporary files or databases.
Now we have introduced how POP3 collects mails and performs MIME decoding. The following is a small program using these two classes:
Include ("pop3.inc. php ");
Include ("mime. inc. php ");
$ Host = "pop.china.com ";
$ User = "boss_ch ";
$ Pass = "mypassword ";
$ Rec = new pop3 ($ host, 110,2 );
$ Decoder = new decode_mail ();
If (! $ Rec-> open () die ($ rec-> err_str );
If (! $ Rec-> login ($ user, $ pass) die ($ rec-> err_str );
If (! $ Rec-> stat () die ($ rec-> err_str );
Echo "a total of". $ rec-> messages. "mails, a total of". $ rec-> size. "bytes
";
If ($ rec-> messages> 0)
{
If (! $ Rec-> listmail () die ($ rec-> err_str );
Echo "the following is the Mail content:
";
For ($ I = 1; $ I <= count ($ rec-> mail_list); $ I ++)
{
Echo "letter". $ rec-> mail_list [$ I] [num]. ", size:". $ rec-> mail_list [$ I] [size]."
";
$ Rec-> getmail ($ rec-> mail_list [$ I] [num]);
$ Decoder-> decode ($ rec-> head, $ rec-> body );
Echo"
Body of the email header:
";
Echo $ decoder-> from_name. "(". $ decoder-> from_mail. ") on ". date ("Y-m-d H: I: s", $ decoder-> mail_time ). "sent ". $ decoder-> to_name. "(". $ decoder-> to_mail. ")";
Echo "n
CC :";
If ($ decoder-> cc_to) echo $ decoder-> cc_to; else echo "none ";
Echo "n
Topic: ". $ decoder-> subject;
Echo "n
Reply to: ". $ decoder-> reply_to;
Echo"
Body of the email:
";
Echo "body type:". $ decoder-> body_type;
Echo"
Body content :";
For ($ j = 0; $ jbody); $ j ++)
{
Echo "n
Type: ". $ decoder-> body [$ j] [type];
Echo "n
Name: ". $ decoder-> body [$ j] [name];
Echo "n
Size: ". $ decoder-> body [$ j] [size];
Echo "n
Content_id: ". $ decoder-> body [$ j] [content_id];
Echo "n
Body character set ". $ decoder-> body [$ j] [char_set];
Echo"
"; Echo" body content: ". $ decoder-> body [$ j] [content]; echo"
";
}
$ Rec-> dele ($ I );
}
}
$ Rec-> close ();
?>
If you want to obtain the complete source code of friends, please contact me: boss_ch@netease.com
Author: Chen Junqing
Reprinted: chinacnet