I. BACKGROUND
We used the SVN pre-commit hook to do an SVN locking function, to control the SVN submission before the line, to prevent the development and planning of random submissions, resulting in online bugs. This platform has a web interface function, you can compete for some people to open a submit permission, and record his submission file and log this time, garbled appeared, the Web interface displayed in Chinese are garbled, including log and submit file name
Garbled format is:/design/x_?229?175?188?229?133?165?230?149?176?230?141?174/04_?233?129?147?229?133?183?230?149?176?230? 141?174?232?161?168/?229?149?134?229?186?151?230?149?176?230?141?174?232?161?168.xls such A;
For example, the Web is garbled:
14370127901362.png
The corresponding actual Chinese characters are:
14370128636098.png
Second, the reason
The problem of character encoding, as we all know, is primarily ASCII encoding and Unicode, utf-8 conversions. On the character encoding, you can see the Nanyi Teacher's article, parse simple and thorough: http://www.ruanyifeng.com/blog .... Html
The above code is basically a 229 format, which is the? SVN should be the addition of the delimiter, the following three digits for ASCII encoding, we can use the ASCII character function to convert it to a character, all of the above ASCII characters together to form a Unicode encoding, Then turn utf-8, you can get urlencode after the Chinese characters, and then UrlDecode can be;
Third, solve
Know the above principle, I use JS solution, so do not modify the back-end code, the server does not need to reload, debugging will be more convenient; If you only use other languages, such as Python, Java, PHP, etc. to parse garbled. You can find the corresponding function to replace it.
3.1 UrlDecode and UrlEncode
For example: Http://www.111cn.net/%e5%88%86%e5%b8%83%e5%bc%8f.html the link inside the%e5%88%86%e5%b8%83%e5%bc%8f is urlencode after the Chinese, The Chinese character "distributed" can be obtained by using UrlDecode. About this part, the network blog a lot, casually find;
The JavaScript method is as follows:
function UrlDecode (ZIPSTR) {
var uzipstr= "";
for (Var i=0;i<zipstr.length;i++) {
var chr = Zipstr.charat (i);
if (Chr = = "+") {
uzipstr+= "";
}else if (chr== "%") {
var asc = zipstr.substring (i+1,i+3);
if (parseint ("0x" +asc) >0x7f) {
Uzipstr+=decodeuri ("%" +asc.tostring () +zipstr.substring (i+3,i+9). toString ());
i+=8;
}else{
Uzipstr+=asciitostring (parseint ("0x" +asc));
i+=2;
}
}else{
uzipstr+= CHR;
}
}
return uzipstr;
}
3.2 Unicode Turn Utf-8:encodeutf8
This part of the code found on the Internet, there may be bugs, I use no problems, other languages should have a simpler way:
function EncodeUtf8 (S1) {
var s = Escape (S1);
var sa = s.split ("%");
var Retv = "";
if (Sa[0]!= "") {
Retv = sa[0];
}
for (var i = 1; i < sa.length; i++) {
if (sa[i].substring (0, 1) = = "U") {
Retv + = Hex2utf8 (Str2hex (sa[i].substring (1, 5));
else Retv + = "%" + sa[i];
}
return RETV;
}
function Str2hex (s) {
var c = "";
var n;
var ss = "0123456789ABCDEF";
var digs = "";
for (var i = 0; i < s.length; i++) {
c = S.charat (i);
n = ss.indexof (c);
Digs + = Dec2dig (eval (n));
}
return value;
return digs;
}
function Dec2dig (N1) {
var s = "";
var n2 = 0;
for (var i = 0; i < 4; i++) {
N2 = Math.pow (2, 3-i);
if (N1 >= n2) {
s + + ' 1 ';
N1 = N1-N2;
} else
s + + ' 0 ';
}
return s;
}
function Dig2dec (s) {
var Retv = 0;
if (s.length = = 4) {
for (var i = 0; i < 4; i++) {
Retv + = eval (S.charat (i)) * MATH.POW (2, 3-i);
}
return RETV;
}
return-1;
}
function Hex2utf8 (s) {
var RetS = "";
var TempS = "";
var ss = "";
if (s.length = = 16) {
TempS = "1110" + s.substring (0, 4);
TempS + = "Ten" + s.substring (4, 10);
TempS + = "Ten" + s.substring (10, 16);
var sss = "0123456789ABCDEF";
for (var i = 0; i < 3; i++) {
RetS + = "%";
SS = temps.substring (I * 8, (eval (i) + 1) * 8);
RetS + = Sss.charat (Dig2dec (ss.substring (0, 4));
RetS + = Sss.charat (Dig2dec (ss.substring (4, 8));
}
return RetS;
}
Return "";
}
3.3 Parsing SVN garbled
Parses SVN format garbled, obtains each character's ASCII, then stitching Unicode, then turns UTF8, then urldecode, successfully resolves the Chinese character;
function Svn_ascii_to_utf8 (ori) {
s = ori.split ('? ');
//Three is a Chinese character, at least one Chinese character
if (S.length < 3) {
return to the Ori;
}
&nbs P var ascii = ';
for (i in s) {
x = s[i];
if (x.length = 3) {
&nb sp; ASCII + = String.fromCharCode (x);
//console.log (ASCII);
}
Else if (X.length > 3) {
ASCII + + string.fromcharcode (x.sub STR (0, 3));
ASCII + + X.SUBSTR (3);
}
Else {
//do nothing
}
}
Return UrlDecode (EncodeUtf8 (ASCII));
}
and then when the Web page is displayed, the corresponding garbled, call S = Svn_ascii_to_utf8 (s), there are garbled to garbled, no garbled remain unchanged;