SQL Server Data migration to PostgreSQL explanation of errors and solutions _postgresql

Source: Internet
Author: User
Tags control characters postgresql printable characters

Problem Recurrence:

1, PG Client:

postgres=# CREATE TABLE Text_test (ID int,info text);
CREATE TABLE
postgres=# insert INTO text_test values (1,e ' \0x00 ');
Error:invalid byte sequence for encoding "UTF8": 0x00

2. SQL Server generates data

CREATE TABLE Test_varchar (ID int,name varchar ());
INSERT into Test_varchar values (1, ' name ' + char (0));
INSERT into Test_varchar values (1, ' name ' + ');

You can then get the data through a Java program and insert it into PG, which also gets the error message:

Invalid byte sequence for encoding "UTF8": 0x00

First of all, we think that when the gb2312 is converted to UTF8, there is an error that cannot be converted. The UTF8 is variable length, 1-6 bytes. His coding rules are as follows:

" td>111110xx
Bits Last code point Byte 1 Byte 2
Byte 3
Byte 4
Byte 5
Byte 6
7 u+007f 0xxxxxxx




11 u+07f F
110xxxxx 10xxxxxx



16 U+ffff
1110xxxx 10xxxxxx
10xxxxxx



21st U+1fffff
11110xxx 10xxxxxx
10xxxxxx
10xxxxxx


u+3ffffff
10xxxxxx
10xxxxxx
10xxxxxx


31 U+7fffffff
1111110x 10xxxxxx
10xxxxxx
10xxxxxx
10xxxxxx
10xxxxxx

And 0x00 is in line with UTF8 rules. This makes us very surprised. And then we found that there were two points that confirmed the problem:
1,

PostgreSQL doesn ' t support storing NULL (\0x00) characters in text fields (this is obviously different from the database N Ull value, which is fully supported).

If you are need to store the NULL character, your must use a Bytea Field-which should store anything you want, but won ' t supp ORT text operations on it.

Given that PostgreSQL doesn ' t support it in text values, there ' no good way to get it to remove it. You are could import your data into Bytea and later convert it to the text using a special function (in Perl or something, maybe?) , but it ' s likely going to is easier to do this in preprocessing for you load it.

source:http://stackoverflow.com/questions/1347646/ postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0

2,

terminating character

indicated by

Tab

\ t

This is the default field terminator.

NewLine character

\ n

This is the default row terminator.

Carriage Return/line Feed

\ r

Backslash1

\\

Null Terminator (nonvisible Terminator)2

The

Any printable character (control characters are not printable, except null, tab, newline, and carriage return)

(*, A, T, L, and so on)

String of up-printable characters, including some or all of the terminators listed

(**\t**, end,!!!!!!!!!!, \t-\n, and "on")

Source:http://msdn.microsoft.com/en-us/library/ms191485.aspx

As a result, we are sure that the PG-NULL processing is not the same as SQL Server processing, so there is an error here.

The PG-specific code that causes the problem is as follows (Src/backend/utils/mb/wchar.c's Pg_verify_mbstr_len):

if (!is_highbit_set (*MBSTR))
    {
      if (*mbstr!= ' ")
      {
        mb_len++;
        mbstr++;
        len--;
        Continue;
      }
      if (NOERROR)
        return-1;
      Report_invalid_encoding (encoding, MBSTR, Len);
    

#define IS_HIGHBIT_SET (CH)   (unsigned char) (CH) & Highbit)
#define Highbit         (0x80)

The Report_invalid_encoding function is to return the error message, which is

Invalid byte sequence for encoding "UTF8": 0x00
And the real cause of this problem is:
!is_highbit_set (*MBSTR) when *mbstr for 0x00, and then to judge whether *mbstr is a, when for the, directly into the function report_invalid_encoding error.

So this problem occurs because PG and SQL Server treat null differently.

Processing scheme:

1, the SQL Server source data to modify the method,

Update:this seems to work:
 
Select * from TABLE
where UNICODE (SUBSTRING (Naughtyfield, LEN (Naughtyfield), 1)) = 0
   so:
 
Update TABLE
SET Naughtyfield = SUBSTRING (Naughtyfield, 1, LEN (Naughtyfield)-1)
where UNICODE ( SUBSTRING (Naughtyfield, LEN (Naughtyfield), 1) = 0
source:http://stackoverflow.com/questions/3533320/ Sql-server-remove-end-string-character-0-from-data

2, the application to modify, to obtain SQL Server data, the data conversion, and the first method of the same.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.