Latex bookmarks Chinese garbled Solution
Author: playerc
Link: http://www.cnblogs.com/playerc/archive/2013/05/20/latex_utf8_bookmark_c.html
When I recently used pdflatex to write a document, I found garbled characters in utf8 bookmarks. I searched the internet and found a valuable solution.
He uses a python-based transcoding solution. Solution 1
Http://www.thinkemb.com/wordpress? P = 260
I wrote a c language. The program principle is clearly stated in solution 1, that is, when pdflatex generates a PDF file containing bookmarks or directories, it needs to be executed twice. For the first time, it is necessary to generate some preparation files, such *. out. This file stores the bookmarks. The second time, a pdf file containing bookmarks or directories is generated based on the first generated file. That is to say, when the first execution is performed, the generated pdf file does not contain bookmarks or directories.
The reason for garbled bookmarks is that when the pdf Reader reads bookmarks, it can only recognize UNICODE encoding. When the tex text environment we use is utf8 or gbk encoding, the generated *. out uses utf8 or gbk encoding, And the generated pdf also retains the encoding. Therefore, non-ascii characters contained in pdf bookmarks are garbled.
The task of this conversion tool is to convert the signature string in *. out From utf8 to unicode. Utf8 encoding and unicode encoding are required.
After compiling with gcc or vs, run
Utf82uni <src. out> dst. out
Src. out is the file to be converted, and dst. out is the file after conversion.
/** * utf82uni.c -- translate bookmark names in Latex's .out file,Utf8 encoded to Unicode;
* compile: cc -o utf82uni utf82uni.c */#include<stdio.h>#define UNICODE_PREFIX "\\376\\377"#define UTF8_MAX_BYTES (6)#define UTF8_THREE (14)void init_int_array(int * array, int length){ while(length>0){ *(array+(--length)) = 0; }}//eo init_int_arrayint main(int argc,char *argv[]){ int bi; //byte number int str[UTF8_MAX_BYTES]={0}; //all byte int gb[2]={0}; int i; int is_begin = 0; //is begin translate int count_brace = 0; int is_brace_begin = 0; int is_brace_end = 0; int is_line_end = 0; while(!feof(stdin)){ i=0; init_int_array(str,UTF8_MAX_BYTES); bi = fread(&str[0],1,1,stdin); is_brace_begin = ((str[0]&0x7f) == '{') ? 1:0; is_brace_end = (str[0]&0x7f) =='}' ? 1:0; is_line_end = (str[0]&0x7f)=='\n' ? 1:0; if(is_brace_begin||is_brace_end){ count_brace ++; }else if(is_line_end){ count_brace=0; } if((count_brace !=3) || is_brace_begin ||is_brace_end){ fwrite(&str[0],bi,1,stdout); is_begin = 0; continue; } // count_brace == 3 ,translate Utf8 code to Unicode with \ooo format; if(is_begin == 0){ printf(UNICODE_PREFIX); is_begin = 1; } if(((str[0]>>4)&0xff)== UTF8_THREE ){ bi = 3; for(i=1; (i< bi) && (!feof(stdin)); i++){ fread(&str[i], 1, 1, stdin); } init_int_array(gb, 2); /** * 1110xxxx , 10 xxxx xx , 10 xxxxxx */ gb[0] = ((str[0]<<4)&0xF0)|((str[1]>>2)&0x0F); gb[1] = ((str[1]<<6)& 0xF0) | ((str[2])& 0x3F); printf("\\%03o\\%03o",gb[0],gb[1]); }else{ printf("\\%03o\\%03o",0,(str[0])&0x7f); } }//eof while return 0;}//eof main