The Pdflib textformat parameter is used to set the text input form with the following valid values:
Bytes: Each byte in the string corresponds to one character. Mainly applies to 8-bit coding.
UTF8: string is UTF-8 encoding.
Ebcdicutf8: String is a EBCDIC UTF-8 encoding that applies only to IBM iseries and zseries.
UTF16: string is UTF-16 encoding. If the string begins with a Unicode tagged byte order number (BOM), Pdflib receives the BOM information and moves it from the string header. If the string does not have a BOM, the byte order of the string depends on the byte order of the host. The Intel x86 system is a small tail (Little-endian,0xfffe), while the SPARC and PowerPC systems are large tails (Big-endian, 0xFEFF).
Utf16be: String is the UTF-16 encoding of the large-tailed byte order. There is no special treatment for the BOM.
Utf16le: String is the UTF-16 encoding of the small-tailed byte order. There is no special treatment for the BOM.
Auto: For 8-bit encodings, it is equivalent to "bytes", which is equivalent to "CMAP" for wide-character strings (Unicode, Glyphid, UCS2, or UTF16 utf16).
In the programming language, we will be able to automatically handle Unicode strings in languages called Support Unicode languages (unicode-capable), which are COM,. NET, Java, REALbasic, and Tcl. Languages that require special handling of Unicode strings are called Unsupported Unicode languages (non-unicode-capable), which are C, C + +, Cobol, Perl, PHP, Python, and RPG.
In the non-unicode-capable language, the "auto" setting will handle most of the text strings correctly.
For the unicode-capable language, the default value for the TextFormat parameter is "UTF16", and the default value for the Non-unicode-capable language is "Auto".
In addition, Pdflib also supports character-referencing methods (Character Reference) that are often used in SGML and HTML. The premise is to set the parameter charref to True, TextFormat set to "bytes":
PDF_set_parameter(p, "charref", "true");
PDF_set_parameter(p, "textformat", "bytes");
Here are some valid character Reference:
& #173; Soft hyphen
& #xAD; Soft hyphen
­ Soft hyphen
& #x20AC; Euro Glyph (hexadecimal)
& #8364; Euro Glyph (decimal)
€ Euro Glyph (Entity name)
< Less than sign
> Greater than sign
& Ampersand sign
Α Greek Alpha