Write and read files according to the specified encoding in ASP

Source: Internet
Author: User
Tags control characters

What is UTF-8?

First, only an integer is allocated to the character encoding table. there are several methods to represent a string of characters as a string of bytes. the two most obvious methods are to store Unicode text as strings of 2 or 4 byte sequences. the formal names of the two methods are UCS-2 and UCS-4, respectively. unless otherwise specified, most of the bytes are like this (bigendian Convention ). convert an ascii or Latin-1 file to a UCS-2 simply insert 0x00 before each ASCII byte. to convert to UCS-4, you must insert three 0x00 before each ASCII byte.
Using UCS-2 (or UCS-4) in UNIX can cause very serious problems. the encoded strings contain special characters, such as '/0' or'/', which have special meanings in the file name and other C-library function parameters. in addition, most UNIX tools that use ASCII files cannot read 16 characters without making major changes. for these reasons, in the file name, text file, environment variables and other places, UCS-2 is not suitable as Unicode external encoding.
The UTF-8 encoding defined in ISO 10646-1 Annex R and RFC 2279 does not have these problems. It is an obvious way to use Unicode in Unix-style operating systems.
Different encoding pages for each language increase the complexity of software that needs to support different languages. Therefore, a world standard called Unicode. Unicode provides a unique value for each character, regardless of the platform, software, or language. That is to say, all the characters used in the world are listed and each character is given a unique and specific number value.
The original objective of Unicode is to use a 16-bit encoding to provide ing for over 65000 characters. However, this is not enough. It cannot cover all historical texts or solve the implantation head-ache problem, especially in network-based applications. The existing software must do a lot of work to program 16-bit data.
Therefore, Unicode uses three encoding methods with some basic reserved characters. They are UTF-8, UTF-16, and UTF-32 respectively. As the name suggests, in a UTF-8, a character is encoded in an 8-bit sequence and represents a character in one or several bytes. The biggest benefit of this approach is that the UTF-8 retains the ASCII character encoding as part of it, for example, in UTF-8 and ASCII, "a" encoding is 0x41.
Utf8 is not a computer code, but a storage and transfer format. As described above, each Unicode/UCS character is stored in 2 or 4 bytes, take a look at the following comparison:
Take "I am Chinese" as an Example
Storage with ANSI: 12 bytes
Save with Unicode/ucs2: 24 bytes + 2 bytes (header)
Stored with ucs4: 48 bytes + 4 bytes (header)
Take "I am a Chinese" as an Example
Storage in ANSI: 10 bytes
Save with Unicode/ucs2: 10 bytes + 2 bytes (header)
Stored with ucs4: 20 bytes + 4 bytes (header)
It can be seen that it is a great waste to store in the original form of Unicode/UCOS, and it is not conducive to the transmission of the Internet (Chinese is a little more cost-effective ^_^ ).
See and this, Unicode/UCS compression form -- utf8 appeared, apply the official website's first sentence "UTF-8 stands for Unicode Transformation format-8. it is an octet (8-bit) lossless encoding of Unicode characters. ", because UTF is also applicable to the encoding of UCs, it is also known as" uctransformation formats (UTF )』

Utf8 is the most basic unit of 8 bits (1bytes) encoding. Of course, it can also be in the form of 16 bits and 32 bits, which are called UTF16 and UTF32 respectively, but it is not used much currently, utf8 is widely used in file storage and network transmission.

---------------------------------------------------------------------------------
ASP solve the UTF-8:
Two XML generation methods:
FSO method to generate xml
<% 'Generated XML, but FSO generated ASCII code, is binary, does not support UTF-8, garbled!
Dim xpath
SQL = "select * from temp"
Set rsmongodb.exe cute (SQL)
XPath = "data. xml"
Set FSO = server. Createobject ("scripting. FileSystemObject ")
Set fout = FSO. createtextfile (server. mappath (XPath ))
Fout. writeline RS ("Temp ")
Fout. Close
%>

The ADODB. Stream method generates XML
<% 'Generate xml with ADODB. Stream that supports UTF-8
Dim xpath
SQL = "select * from temp"
Set rsmongodb.exe cute (SQL)
STR = RS ("Temp ")

Set objstream = server. Createobject ("ADODB. Stream ")
With objstream
. Open
. Charset = "UTF-8"
. Position = objstream. Size
. Writetext = Str
. Savetofile server. mappath ("Kevin. xml"), 2
. Close
End
Set objstream = nothing

Rs. Close
Set rs = nothing
%>
----------------------------------------------------------------------------

Files in ASP that operate on UTF-8 formats
Note: ASP is not Asp.net.
Since ASP is an old language, some of its functions have very poor support for UTF-8.
For example, if you want to generate a file in UTF-8 format, you won't be able to use the commonly used scripting. FileSystemObject object.

Scripting. FileSystemObject:
FileSystemObject. createtextfile (filename [, overwrite [, Unicode])

The Unicode attribute is described as follows:

Optional. Boolean indicates whether to create a file in Unicode or ASCII format. If a file is created in unicode format, the value is true. If a file is created in ASCII format, the value is false. If this part is omitted, an ASCII file is created.

We cannot use this function to create UTF-8 format files.
At this time, we can use the ADODB. Stream object. The usage is as follows:

Set objstream = server. Createobject ("ADODB. Stream ")
With objstream
. Open
. Charset = "UTF-8"
. Position = objstream. Size
. Writetext = Str
. Savetofile server. mappath ("/sitemap. xml"), 2
. Close
End
Set objstream = nothing

Appendix:
Introduction to ASCII, Unicode, UTF-8:
ASCII is a character set, including uppercase and lowercase English letters, numbers, and control characters. It is represented in one byte and ranges from 0 to 127.
Because ASCII characters are very limited, each country or region puts forward its own character set on this basis. For example, gb2312, which is widely used in China, provides encoding for Chinese characters, it is expressed in two bytes.
These character sets are incompatible with each other. The same number may indicate different characters, which makes information exchange troublesome.
Unicode is a character set that maps all characters in the world into a unique number (Code Point), such as the number 0x0041 corresponding to letter. Unicode is still in development, and more characters are supported.
A certain encoding method, such as a UCS-2, is also required to store characters represented by Unicode, which uses two bytes to represent Unicode-encoded characters. While UTF-8 is another encoding method of the Unicode character set, it is a variable length, up to 6 bytes, less than 127 characters are represented in one byte, the same as the results of the ASCII character set, therefore, it has a very good compatibility. The English text in ASCII encoding can be processed as a UTF-8 without modification. It is widely used.

FAQ about UTF-8 and Unicode
Http://www.linuxforum.net/books/UTF-8-Unicode.html

ADODB. Stream component charset Attribute Value
Http://www.5iya.com/blog/post/adodb_stream_charset_value.asp

Use ADODB. Stream instead of FSO to read text files
Http://www.99net.net/study/page/1025101521.htm

-----------------------------------------------------------------------------
Solutions for compatibility with UTF-8 and other words

I have been studying for many days and have tried many methods to find the best method:
Let's talk about the basic things:
<% @ CodePage = 936%> Simplified Chinese
<% @ CodePage = 950%> traditional Chinese
<% @ CodePage = 65001%> UTF-8

CodePage specifies the encoding used by IIS to read passed strings (such as form submission and address bar transfer ).
The cause of garbled characters is that the module encoding is different when the website is to be integrated.
Like my blog, this problem occurs during integration, because blog is Utf-8,
Recently, many netizens have been asking for this question. I have tried many methods.
The most convenient method is as follows:
Do not convert the encoding of the webpage in any module, such as UTF-8 or UTF-8, or gb22312 or gb2312.
Add
<% @ Language = "VBScript" codePage = "65001" %>
<% Session. codePage = 65001%>
Add
<% @ Language = "VBScript" codePage = "936" %>
<% Session. codePage = 936%>
Other codes.

'Open the file
Public Function openfile (filename as string) as Boolean
'Filename': name of the file to be opened

If ado_stream is nothing then exit function

Err. Number = 0
On Error goto ferr

With ado_stream
. Type = 1
. Mode = 3
. Open
. Loadfromfile filename
. Position = 0
. Type = 2
. Charset = IIF (coding = "", "gb2312", coding)
Filebody =. readtext
. Close
End
Openfile = true

Exit Function
Ferr:
Errinfo = filename & "failed to open! Error message: "& err. Description
Debug. Print errinfo
Err. Number = 0
End Function

'Save the file
Public Function SaveFile (filename as string, strfilebody as string) as Boolean
'Filename': file storage path name
'Strfilebody: the content of the file to be saved.

If ado_stream is nothing then exit function

Err. Number = 0
On Error goto ferr

With ado_stream
. Type = 2
. Mode = 3
. Charset = IIF (coding = "", "gb2312", coding)
. Open
. Writetext strfilebody
. Savetofile filename, 2
End

SaveFile = true
Exit Function
Ferr:
Errinfo = filename & "failed to save! Error message: "& err. Description
Debug. Print errinfo
Err. Number = 0
End Function

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.