Base64 with MIME and UTF-7

Source: Internet
Author: User
Tags printable characters rfc mozilla thunderbird

Base64 is a representation of binary data based on 64 printable characters. Since 2 of the 6 is equal to 64, every 6 bits is a unit that corresponds to a printable character. Three bytes have 24 bits, which correspond to 4 Base64 units, 3 bytes need to be represented by 4 printable characters. It can be used as the transmission encoding for e-mail. The printable characters in Base64 include the letter A-Z, A-Z, the number 0-9, which is 62 characters in total, and two printable symbols that differ from one system to another. Some other encoding methods, such as Uuencode, and later BinHex versions use a different 64 character set to represent 6 binary digits, but they are not called Base64.

Base64 is often used to represent, transmit, and store some binary data in situations where text data is normally processed. Includes the MIME Email,email via mime, which stores complex data in XML.

MIME

In MIME-formatted e-mail messages, base64 can be used to encode binary byte sequence data into text that consists of sequences of ASCII characters. When used, specify Base64 in the transfer encoding mode. The characters used include 26 uppercase and lowercase letters, plus 10 digits, and a plus sign "+", a slash "/", a total of 64 characters, and an equal sign "=" to use as a suffix.

The full base64 definition is visible in RFC 1421 and RFC 2045. The encoded data is slightly longer than the original data and is the original. In the e-mail message, according to RFC 822, each 76 characters, plus a carriage return to wrap. You can estimate the length of the data after encoding is approximately 135.1% of the original.

When converting, the data of three bytes is placed in a 24bit buffer successively, and the first byte is the high. If the data is less than 3byte, the remaining bits in the buffer are filled with 0. Then, each time a 6 (because) bit is taken out, the ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ character in its value selection is used as the encoded output. Continue until the full input data conversion is complete.

If there are two input data left, add 1 "=" after the encoding result, and if there is a last input data, add 2 "=" after the encoding result, and if there is no data left, do not add anything, so as to guarantee the correctness of the data restoration.

Example

For example, a quote from Thomas Hobbs:

Man was distinguished, not only by he reason, but by this singular passion from other animals, which are a lust of the mind , that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short Vehemenc E of any carnal pleasure.

After base64 encoding, it becomes:

Twfuiglzigrpc3rpbmd1axnozwqsig5vdcbvbmx5igj5ighpcybyzwfzb24sigj1dcbiesb0aglzihnpbmd1bgfyihbhc3npb24gznjvbsbvdghlcibhbmlty Wxzlcb3agljacbpcybhigx1c3qgb2ygdghlig1pbmqsihroyxqgynkgysbwzxjzzxzlcmfuy2ugb2ygzgvsawdodcbpbib0agugy29udgludwvkigfuzcbpbm Rlzmf0awdhymxligdlbmvyyxrpb24gb2yga25vd2xlzgdllcblegnlzwrzihrozsbzag9ydcb2zwhlbwvuy2ugb2ygyw55ignhcm5hbcbwbgvhc3vyzs4 =
    • Code "man"
Text M A N
ASCII encoding 77 97 110
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Index 19 22 5 46
BASE64 encoding T W F U

In this example, the BASE64 algorithm encodes three characters into 4 characters

Base64 Index Table:

Value Char Value Char Value Char Value Char
0 A 16 Q 32 G 48 W
1 B 17 R 33 H 49 X
2 C 18 S 34 I 50 Y
3 D 19 T 35 J 51 Z
4 E 20 U 36 K 52 0
5 F 21st V 37 L 53 1
6 G 22 W 38 M 54 2
7 H 23 X 39 N 55 3
8 I 24 Y 40 O 56 4
9 J 25 Z 41 P 57 5
10 K 26 A 42 Q 58 6
11 L 27 B 43 R 59 7
12 M 28 C 44 S 60 8
13 N 29 D 45 T 61 9
14 O 30 E 46 U 62 +
15 P 31 F 47 V 63 /

If the number of bytes to encode cannot be divisible by 3 and the last 1 or 2 bytes, then the following method can be used to process: First use the 0-byte value at the end of the top, so that it is divisible by 3, and then the Base64 encoding. After the encoded base64 text, add one or two ' = ' sign, which represents the number of bytes to be replenished. That is, when the last remaining eight-bit byte (a byte), the last 6-bit base64 byte block has four bits is 0 value, and the last two equals, if the last two bits of eight bytes (2 byte), the last 6 bits of the base byte block has two bits is the 0 value, and finally append an equal sign. Refer to the following table:

Text (1 bytes) A
Bits 0 1 0 0 0 0 0 1
Bits (complement 0) 0 1 0 0 0 0 0 1 0 0 0 0
BASE64 encoding Q Q
Text (2 bytes) B C
Bits 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 X X X X X X
Bits (complement 0) 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 X X X X X X
BASE64 encoding Q K M
UTF-7

UTF-7 is a modified Base64 (Modified Base64). The main purpose is to encode the UTF-16 data into a printable ASCII character sequence using the Base64 method. The purpose is to transmit Unicode data. The main difference is that you don't use the equals "=" supplement, because the character usually requires a lot of translation.

The standard is visible in RFC 2152, "A mail-safe transformation Format of Unicode".

Ircu

In the P10 IRC inter-server protocol used by software such as IRCU, the message Class model (Client/server numerics) and binary IP addresses of the client and server are Base64 encoded. The message class model is fixed to a length of 3 bytes, so it can be encoded directly into 4 bytes without the need for padding. When you encode an IP address, you need to add some 0 bits before the address so that it can be encoded as an integer number of bytes. The symbol set used here differs from the aforementioned mime, changing the +/to [].

application in URL

BASE64 encoding can be used to pass longer identity information in an HTTP environment. For example, in the Java Persistence System hibernate, Base64 is used to encode a long unique identifier (typically the 128-bit uuid) as a string that is used as a parameter in an HTTP form and an HTTP GET URL. In other applications, it is often necessary to encode binary data as appropriate in the form of URLs (including hidden form fields). At this time, the adoption of BASE64 encoding is not only short, but also has the non-readability, that is, the encoded data will not be directly visible to the naked eye.

However, the standard Base64 is not suitable for direct transmission in the URL, because the URL encoder will be in the standard Base64 "/" and "+" characters into the form of "%XX", and these "%" number in the database will need to be converted, because ANSI SQL has the "%" Used as a wildcard character.

To solve this problem, a modified Base64 encoding for URLs is used, which does not populate the ' = ' at the end, and the "+" and "/" in standard Base64 are changed to "-" and "_" respectively, thus eliminating the conversion required for URL codec and database storage. Avoids the increase in the length of encoded information in this process, and unifies the format of object identifiers in databases, forms, and so on.

There is also an improved BASE64 variant for regular expressions that changes "+" and "/" to "!" and "-", because "+", "*" and the "[" and "]" used in the preceding IRCU may have special meanings in regular expressions.

There are also variants that change "+/" to "_-" or ". _" (used as an identifier name in a programming language) or ".-" (for nmtokenin XML) or even "_:" (for namein XML).

Other Applications
    • Mozilla Thunderbird and Evolution use Base64 to secure email passwords
    • Base64 is also often used as a simple "encryption" to protect certain data, and real encryption is often cumbersome.
    • Spammers use Base64 to circumvent anti-spam tools, because those tools don't usually translate Base64 messages.
    • In the LDIF file, Base64 is used as the encoded string.

Base64 with MIME and UTF-7

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.