Remove BOM of ADODB. Stream output UTF-8

Source: Internet
Author: User

With ADODB. Stream, you can easily read and write binary and text streams, read and write files, and specify character set encoding for text streams.
However, when ADODB. Stream outputs a text stream of UTF-8 encoding, it will add bom at the front end of the stream.
What is Bom: byte order mark, A UTF-8 code-specific tag that occupies 3 bytes of "ef bb bf" at the beginning of the file stream ".
What is the use of Bom? I will not discuss it here. Here I will discuss how to use it in ADODB. stream (simplified as stream or stream) removes the BOM of these three bytes, because in many cases, we do not want the output file to contain the three bytes at the beginning.

 

The simplest method is as follows:
We know that BOM occupies 3 bytes, so we can write the UTF-8 string in stream, convert stream to binary type, and skip the first 3 bytes, copy the remaining bytes to a new stream using the copyto () method.
However, it seems that there are potential serious performance problems. If the text stream is large, will it consume double resources to process the stream?

 

Then we will optimize the above solution.
Test showed that stream automatically adds three bytes at the beginning of stream when calling the writetext () method for the first time in the text mode encoded by UTF-8.
More strictly speaking, in the following status (JS Code ):
Stream. Position = 0; // stream position at the beginning
Stream. type = 2; // adtypetext
Stream. charset = 'utf-8 ';
When the stream. writetext () method is called, stream automatically inserts a three-byte bom at the beginning of the stream.

 

However, when stream. position is not 0, calling the writetext () method will not insert BOM any more. You can use this to avoid automatically inserting Bom.
If we need to write 10 UTF-8 characters to the stream, we first write onlyThe first UTF-8 character.
Then, convert stream to the binary type (adtypebinary), skip the three BOM bytes starting with stream, and read the remaining bytes (these bytes should only containOne characterData, excluding other impurities ).
Return to the beginning of the stream and write the read bytes to the stream again. Call seteos () to set the current position to the end of the stream immediately after writing the data.
Then, the stream is converted back to the text type (adtypetext), and the current position of the stream is moved to the end of the stream.
At this point, continue writing the remaining 9 UTF-8 characters, stream will directly add its encoded byte data to the end, instead of inserting the BOM.
To continue writing text to stream, call the writetext () method.

 

Through the test, we can find that the UTF-8 string written in the above method can be normally read through readtext () method, but stream. size is 3 smaller than the stream directly written to the UTF-8 in the traditional way, it is obvious that the three "redundant" Bom bytes are missing.

 

The problem has not ended yet.
In this case, you may want to directly call the stream. savetofile () method to save it to a file. You may find that sometimes the saved file contains a Bom. Is the above method invalid?
This is because you missed the key step: Before calling the savetofile () method, you need to change the flow to the binary type.
It turned out that ADODB. Stream was clever, and the output found that the beginning of the UTF-8 text stream is missing Bom, it will be added again.
However, if you change the stream type to binary, you can bypass the BOM monitoring of ADODB. Stream during output.

 

Test code: (save as *. HTA file for testing)

Test ADODB. Stream no Bom. HTA

<! Doctype HTML public "-// W3C // dtd xhtml 1.0 transitional // en" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <br/> <HTML xmlns = "http://www.w3.org/1999/xhtml"> <br/> <pead> <br/> <title> </title> <br/> <meta name = "author" content = "caikanxp"/> <br/> <meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8 "/> <br/> <SCRIPT type =" text/JavaScript "> <! -- <Br/> function getstreamwithbom (STR, another) {<br/> var stream = new activexobject ('ADODB. stream '); <br/> stream. mode = 3; <br/> stream. open (); <br/> stream. type = 2; <br/> stream. charset = 'utf-8'; <br/> stream. writetext (STR); <br/> another & stream. writetext (another); <br/> return stream; <br/>}< br/> function getstreamwithoutbom (STR, another) {<br/> var stream = new activexobject ('ADODB. stream '); <br /> Stream. mode = 3; <br/> stream. open (); </P> <p> writeutf8withoutbom (stream, STR); <br/> another & stream. writetext (another); <br/> return stream; <br/>}< br/> function writeutf8withoutbom (stream, text) {<br/> stream. position = 0; // reset stream position before changing type <br/> stream. type = 2; // adtypetext <br/> stream. charset = 'utf-8'; <br/> // There Is a bom (3 bytes) will be automatically appended To the beginning of stream <br/> stream. writetext (text. substr (0, 1); // write only the first char <br/> stream. seteos (); <br/> stream. position = 0; <br/> stream. type = 1; // adtypebinary <br/> stream. position = 3; // skip BOM bytes <br/> var BS = stream. read (); // read the byte array of chars <br/> stream. position = 0; <br/> stream. write (BS); // overwrite the BOM with the byte array of first char <br/> stream. Seteos (); <br/> stream. position = 0; <br/> stream. type = 2; // adtypetext <br/> stream. position = stream. size; // The remain text will be appended to end of stream <br/> stream. writetext (text. substr (1); // No BOM will be appended to the beginning of stream now <br/>}< br/> function output (stream, title) {<br/> var filename, message; <br/> stream. position = 0; <br/> filename = 'C: // '+ title +' (text type saved Using .txt '; <br/> message = ['text content:', stream. readtext (), 'stream size: ', stream. Size, 'Save to file? ', Filename]; <br/> confirm (message. join ('/N'), filename) & stream. savetofile (filename, 2); <br/> stream. position = 0; <br/> filename = 'C: // '+ title +' (binary type saved.txt '; <br/> message = ['text content :', stream. readtext (), 'stream size: ', stream. size, 'Save to file? ', Filename]; <br/> stream. position = 0; <br/> stream. type = 1; // change type to binary before saving <br/> confirm (message. join ('/N'), filename) & stream. savetofile (filename, 2); <br/>}< br/> function test () {<br/> var text = 'multibyte string '; <br/> text = prompt ('input some text: ', text) | text; <br/> var another = prompt ('input another text :', 'Some text additional... '); <br/> var stream1 = getstreamwithbom (text, another); <br/> var stream2 = getstreamwithoutbom (text, another); <br/> output (stream1, 'utf-8 with BOM '); <br/> output (stream2, 'utf-8 without BOM'); <br/>}< br/> test (); <br/> // --> </SCRIPT> <br/> </pead> <br/> <body onload = "window. close () "> <br/> </body> <br/> </ptml> <br/>

 

 

A simple encapsulation class (JS version ):

Utf8nobomstream. HTA

<! Doctype HTML public "-// W3C // dtd xhtml 1.0 transitional // en" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <br/> <HTML xmlns = "http://www.w3.org/1999/xhtml"> <br/> <pead> <br/> <title> </title> <br/> <meta name = "author" content = "caikanxp"/> <br/> <meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8 "/> <br/> <SCRIPT type =" text/JavaScript "> <! -- <Br/>/** <br/> * output a dedicated packaging class for a UTF-8 without BOM <br/> */<br/> function utf8nobomstream () {<br/> var stream = new activexobject ('ADODB. stream '); <br/> stream. mode = 3; <br/> stream. open (); <br/> stream. charset = 'utf-8'; <br/> This. stream = stream; <br/>}< br/>/** <br/> * proxy ADODB. the writetext () method of stream. The BOM is automatically removed from the first call <br/> */<br/> utf8nobomstream. prototype. writetext = function (text, option) {<br/> option = option | 0; <br/> var stream = This. stream; <br/> // The stream location is not at the beginning (existing data) and is directly written to <br/> If (stream. position! = 0) {<br/> stream. writetext (text, option); <br/> return; <br/>}< br/> // The stream position starts at the beginning, after writing text, you need to remove BOM <br/> // write the first character <br/> stream. writetext (text. charat (0); <br/> stream. seteos (); <br/> // read the first byte data in binary mode <br/> stream. position = 0; <br/> stream. type = 1; <br/> stream. position = 3; <br/> var BS = stream. read (); <br/> // move the first byte of data to the starting position of the stream and overwrite the BOM <br/> stream. position = 0; <br/> stream. write (BS); <br/> stream. seteos (); <br/> // Change the stream back to text mode and write the remaining characters <br/> stream. position = 0; <br/> stream. type = 2; <br/> stream. position = stream. size; <br/> stream. writetext (text. substr (1), option); <br/>}; <br/>/** <br/> * proxy ADODB. stream's savetofile () method. Before the output, change the stream type to binary to avoid outputting Bom again <br/> */<br/> utf8nobomstream. prototype. savetofile = function (filename, option) {<br/> option = option | 1; <br/> var stream = This. stream; <br/> stream. position = 0; <br /> Stream. type = 1; <br/> stream. savetofile (filename, option); <br/> stream. type = 2; <br/>}; <br/> function test () {<br/> var text = 'multibyte string '; <br/> text = prompt ('input some text: ', text) | text; <br/> var another = prompt ('input another text :', 'Some text additional... '); <br/> var stream = new utf8nobomstream (); <br/> stream. writetext (text); <br/> another & stream. writetext (another); <br/> var filenam E = 'C: // utf8nobomstream.txt '; <br/> filename = prompt (' warning: Click OK to write the file! /N file name: ', filename) <br/> filename & stream. savetofile (filename, 2); <br/>}< br/> test (); <br/> // --> </SCRIPT> <br/> </pead> <br/> <body onload = "window. close () "> <br/> </body> <br/> </ptml> <br/>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.