This paper mainly discusses the implementation of Chinese URL decoding, not detailed URL coding, UTF-8 encoding. To learn more about the codec problem, please refer to the relevant information
URL encoding: Hexadecimal of the ASCII code of the real character. Just a slight change, need to add "%" in front. For example "\", its ASCII code is 92,92 hexadecimal is 5c, so "\" URL encoding is%5c.
UTF-8 encoding is a variable-length character encoding of Unicode, created by Ken Thompson in 1992. It is now standardized to RFC 3629. UTF-8 encodes Unicode characters in 1 to 6 bytes. If a Unicode character is represented by 2 bytes, it is likely that encoding into UTF-8 will require 3 bytes, and if the Unicode character is represented by 4 bytes, it may take 6 bytes to encode into UTF-8.
All we need to know here is that Utf-8 encodes an English character by one byte and encodes a Chinese character with three bytes. The following URL encoding is now implemented for decoding.
URL Code: Mfc%e8%8b%b1%e6%96%87%e6%89%8b%e5%86%8c.chm
The source code is passed in the Windows XP SP2 + vc++6.0 test (improved code).
#include <afx.h> #include <iostream>void utf8togb (cstring& str), void Ansitogb (char* str,int N) {ASSERT ( Str!=null); Ensure that the passed in parameter cannot be null wchar_t Szwchar = 0; CString Szresult,szhead = "", Szend = ""; CString Szrst; Char ch, hex[2] = ""; int IX = 0; Szresult = str; int imax = Szresult.getlength (); int ih = szresult.find ("%", 0); int ie = szresult.reversefind ('% '); Szhead = Szresult.left (IH); Szend = Szresault.right (imax-ie-3); Szresult = ""; IX = IH; CString strtemp; BOOL bIsHaveUTF8 = false; while (ch = * (str + ix)) {if (ch = = '% ') {hex[0] = * (str + IX + 1); HEX[1] = * (str + IX + 2); SSCANF (Hex, "%x", &szwchar); Szrst + = Szwchar; ix+=3; BIsHaveUTF8 = true; } else {if (bIsHaveUTF8) {UTF8TOGB (Szrst); Strtemp+=szrst; Szrst= ""; BIsHaveUTF8 = false; }//Remove the characters that do not have to be converted strtemp + = * (str + IX); ix++; }} szresult = Szhead + strtemp; memset (Str,0,n); strcpy (Str,szresult);} void Utf8togb (cstring& szstr) {wchar* StRSRC; tchar* Szres; int i = MultiByteToWideChar (Cp_utf8, 0, Szstr,-1, NULL, 0); STRSRC = new Wchar[i + 1]; MultiByteToWideChar (Cp_utf8, 0, Szstr,-1, STRSRC, i); i = WideCharToMultiByte (CP_ACP, 0, Strsrc,-1, NULL, 0, NULL, NULL); Szres = new Tchar[i + 1]; WideCharToMultiByte (CP_ACP, 0, Strsrc,-1, szres, I, NULL, NULL); Szstr = Szres; DELETE[]STRSRC; Delete[]szres;} int main (int argc, char* argv[]) {//str = "%e6%96%b0%e5%bb%ba"; char str[] = "Mfc%e8%8b%b1%e6%96%87%e6%89%8b%e5%86%8c.chm ";///Note that the first parameter passed to ANSITOGB must not be a constant string,//Because ANSITOGB internally also returns the result from the first parameter//Of course these are just details, not worth haggling over, we can modify into their own appropriate, for example, The decoded results can be passed through other parameters .... ANSITOGB (Str,strlen (str) *sizeof (char));p rintf ("Result:%s\n", str); return 0;}
Chinese and English URL decoding VC + + source program