This paper mainly discusses the implementation of Chinese URL decoding, no specific URL encoding, UTF-8 encoding. For more specific information on codec issues, please refer to the relevant documentation
URL encoding: Hexadecimal of the ASCII code of the real character. There is only a slight change, which needs to be preceded by a "%". For example "\", its ASCII code is 92,92 hexadecimal is 5c, so "\" URL encoding is%5c.
UTF-8 encoding is a variable-length character encoding of Unicode, created by Ken Thompson in 1992. It is now standardized to RFC 3629. UTF-8 encodes Unicode characters in 1 to 6 bytes. Assuming that Unicode characters are represented by 2 bytes, it is very likely that encoding into UTF-8 will require 3 bytes, whereas Unicode characters are represented by 4 bytes, then encoding into UTF-8 may require 6 bytes.
All we need to know here is that Utf-8 encodes an English character using a single byte, and encodes a Chinese character with three bytes. Today, for example, the following URL encoding is implemented.
URL Code: Mfc%e8%8b%b1%e6%96%87%e6%89%8b%e5%86%8c.chm
Source code in Windows XP SP2 + vc++6.0 test pass (improved codes).
#include <afx.h> #include <iostream>void utf8togb (cstring& str), void Ansitogb (char* str,int N) {ASSERT ( Str!=null); Ensure that the passed in parameters cannot be null wchar_t Szwchar = 0; CString Szresult,szhead = "", Szend = ""; CString Szrst; Char ch, hex[2] = ""; int IX = 0; Szresult = str; int imax = Szresult.getlength (); int ih = szresult.find ("%", 0); int ie = szresult.reversefind ('% '); Szhead = Szresult.left (IH); Szend = Szresault.right (imax-ie-3); Szresult = ""; IX = IH; CString strtemp; BOOL bIsHaveUTF8 = false; while (ch = * (str + ix)) {if (ch = = '% ') {hex[0] = * (str + IX + 1); HEX[1] = * (str + IX + 2); SSCANF (Hex, "%x", &szwchar); Szrst + = Szwchar; ix+=3; BIsHaveUTF8 = true; } else {if (bIsHaveUTF8) {UTF8TOGB (Szrst); Strtemp+=szrst; Szrst= ""; BIsHaveUTF8 = false; }//Remove the characters that do not have to be converted strtemp + = * (str + IX); ix++; }} szresult = Szhead + strtemp; memset (Str,0,n); strcpy (Str,szresult);} void Utf8togb (cstring& szstr) {wchar* StRSRC; tchar* Szres; int i = MultiByteToWideChar (Cp_utf8, 0, Szstr,-1, NULL, 0); STRSRC = new Wchar[i + 1]; MultiByteToWideChar (Cp_utf8, 0, Szstr,-1, STRSRC, i); i = WideCharToMultiByte (CP_ACP, 0, Strsrc,-1, NULL, 0, NULL, NULL); Szres = new Tchar[i + 1]; WideCharToMultiByte (CP_ACP, 0, Strsrc,-1, szres, I, NULL, NULL); Szstr = Szres; DELETE[]STRSRC; Delete[]szres;} int main (int argc, char* argv[]) {//str = "%e6%96%b0%e5%bb%ba"; char str[] = "Mfc%e8%8b%b1%e6%96%87%e6%89%8b%e5%86%8c.chm ";///Note that the first parameter passed to ANSITOGB here must not be a constant string,//Because the ANSITOGB internal also from the first parameter to return the results//Of course these are just details, not worth preoccupied, we can change into their own appropriate, for example, The decoded result can be passed through other parameters .... ANSITOGB (Str,strlen (str) *sizeof (char));p rintf ("Result:%s\n", str); return 0;}
Chinese and English URL decoding VC + + source program