Original article:
This article mainly discusses the implementation of Chinese URL Decoding. url encoding and UTF-8 encoding are not described in detail. For more information about encoding and decoding, see relevant materials.
URL encoding: hexadecimal format of the ASCII code. Only slightly changed. You need to add "%" to the front ". For example, the ASCII code of "\" is 92,92's hexadecimal code is 5C, so the URL encoding of "\" is % 5c.
UTF-8 encoding is a variable-length Unicode Code created in 1992 by Ken Thompson. Now it has been standardized as RFC 3629. The UTF-8 encodes Unicode characters in 1 to 6 bytes. If Unicode characters are represented by 2 bytes, it is likely to require 3 bytes to be encoded into the UTF-8, And if Unicode characters are represented by 4 bytes, it may require 6 bytes to be encoded into the UTF-8.
Here we only need to know That UTF-8 uses one byte for an English character and three bytes for a Chinese character. The following url encoding is decoded.
URL encoding: MFC % E8 % 8B % B1 % E6 % 96% E6 % 87% 8B % E5 % 89% 8C. CHM
The source code is tested in Windows XP SP2 + VC ++ 6.0 (improved code ).
# Include <afx. h>
# Include <iostream>
Void utf8togb (cstring & Str );
Void ansitogb (char * STR, int N)
{
Assert (STR! = NULL); // ensure that the input parameter cannot be null
Wchar_t szwchar = 0;
Cstring szresult, szhead = "", szend = "";
Cstring szrst;
Char CH, Hex [2] = "";
Int IX = 0;
Szresult = STR;
Int IMAX = szresult. getlength ();
Int ih = szresult. Find ("%", 0 );
Int Ie = szresult. reversefind ('% ');
Szhead = szresult. Left (IH );
// Szend = szresault. Right (IMAX-ie-3 );
Szresult = "";
IX = ih;
Cstring strtemp;
Bool bishaveutf8 = false;
While (CH = * (STR + ix ))
{
If (CH = '% ')
{
Hex [0] = * (STR + ix + 1 );
Hex [1] = * (STR + ix + 2 );
Sscanf (Hex, "% x", & szwchar );
Szrst + = szwchar;
IX + = 3;
Bishaveutf8 = true;
}
Else
{
If (bishaveutf8)
{
Utf8togb (szrst );
Strtemp + = szrst;
Szrst = "";
Bishaveutf8 = false;
}
// Retrieve unnecessary characters
Strtemp + = * (STR + ix );
IX ++;
}
}
Szresult = szhead + strtemp;
Memset (STR, 0, N );
Strcpy (STR, szresult );
}
Void utf8togb (cstring & szstr)
{
Wchar * strsrc;
Tchar * szres;
Int I = multibytetowidechar (cp_utf8, 0, szstr,-1, null, 0 );
Strsrc = new wchar [I + 1];
Multibytetowidechar (cp_utf8, 0, szstr,-1, strsrc, I );
I = widechartomultibyte (cp_acp, 0, strsrc,-1, null, 0, null, null );
Szres = new tchar [I + 1];
Widechartomultibyte (cp_acp, 0, strsrc,-1, szres, I, null, null );
Szstr = szres;
Delete [] strsrc;
Delete [] szres;
}
Int main (INT argc, char * argv [])
{
// STR = "% E6 % 96% B0 % E5 % BB % ba ";
Char STR [] = "MFC % E8 % 8B % B1 % E6 % 96% E6 % 87% 8B % E5 % 89% 8C. chm ";
// Note that the first parameter passed to ansitogb here must not be a constant string,
// Because ansitogb still needs to return the result from the first parameter
// Of course, these are only the details and are not worth your consideration. You can modify them as appropriate. For example, the decoded results can be transmitted through other parameters ....
Ansitogb (STR, strlen (STR) * sizeof (char ));
Printf ("Result: % s \ n", STR );
Return 0;
}
// After converting this function, we found that the result is: MFC English manual. CHM
Hey hey ..