SQL Server data migration to PostgreSQL error explanation and solution, serverpostgresql

Source: Internet
Author: User
Tags printable characters

SQL Server data migration to PostgreSQL error explanation and solution, serverpostgresql

Problem reproduction:

1. PG client:

postgres=# create table text_test (id int,info text);
CREATE TABLE
postgres=# insert into text_test values (1,E'\0x00');
ERROR: invalid byte sequence for encoding "UTF8": 0x00

2. SQL Server generates data

create table test_varchar(id int,name varchar(20));
insert into test_varchar values (1, 'name' + char(0));
insert into test_varchar values (1, 'name' + '');

Then, get the data through the java program and insert it to PG. The error message is also displayed:

invalid byte sequence for encoding "UTF8": 0x00

First, we think that an error that cannot be converted occurs when gb2312 is converted to UTF8. The result indicated that UTF8 is longer than 1-6 bytes. The encoding rules are as follows:


Bits Last code point Byte 1 Byte 2
Byte 3
Byte 4
Byte 5
Byte 6
7 U + 007F 0 xxxxxxx




11 U + 07FF
110 xxxxx 10 xxxxxx



16 U + FFFF
1110 xxxx 10 xxxxxx
10 xxxxxx



21 U + 1 FFFFF
11110xxx 10 xxxxxx
10 xxxxxx
10 xxxxxx


26 U + 3 FFFFFF
111110xx 10 xxxxxx
10 xxxxxx
10 xxxxxx
10 xxxxxx

31 U + 7 FFFFFFF
1111110x 10 xxxxxx
10 xxxxxx
10 xxxxxx
10 xxxxxx
10 xxxxxx


While 0x00 complies with the UTF8 rules. This made us very surprised. Then we found two points and confirmed the problem:
1,

PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.

Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.

Source:http://stackoverflow.com/questions/1347646/postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0

2,


Terminating character

Indicated

Tab

\ T

This is the default field terminator.

Newline character

\ N

This is the default row terminator.

Carriage return/line feed

\ R

Backslash1

\\

Null terminator (nonvisible terminator) 2

\ 0

Any printable character (control characters are not printable, character t null, tab, newline, and carriage return)

(*, A, t, l, and so on)

String of up to 10 printable characters, including some or all of the terminators listed earlier

(** \ T **, end ,!!!!!!!!!!, \ T-\ n, and so on)


Source: http://msdn.microsoft.com/en-us/library/ms191485.aspx

Therefore, it is determined that pg's processing of null is different from that of SQL Server, so an error occurs here.

The specific PG code that causes this problem is as follows (src/backend/utils/mb/wchar. c pg_verify_mbstr_len ):

if (!IS_HIGHBIT_SET(*mbstr))
    {
      if (*mbstr != '\0')
      {
        mb_len++;
        mbstr++;
        len--;
        continue;
      }
      if (noError)
        return -1;
      report_invalid_encoding(encoding, mbstr, len);
    }

#define IS_HIGHBIT_SET(ch)   ((unsigned char)(ch) & HIGHBIT)
#define HIGHBIT         (0x80)

 

The report_invalid_encoding function returns the error message, that is

Invalid byte sequence for encoding "UTF8": 0x00
What really causes this problem is:
! IS_HIGHBIT_SET (* mbstr): When * mbstr is 0 X, the system determines whether * mbstr is \ 0. When * mbstr is \ 0, the system directly enters the report_invalid_encoding function to report an error.

The reason for this problem is that PG and SQL Server have different processing methods for null.

Solution:

1. Modify the SQL Server Source data,

UPDATE: This seems to work:
 
Select * from TABLE
where UNICODE(SUBSTRING(naughtyField, LEN(naughtyField), 1)) = 0
So:
 
Update TABLE
SET naughtyField = SUBSTRING(naughtyField, 1, LEN(naughtyField) - 1)
where UNICODE(SUBSTRING(naughtyField, LEN(naughtyField), 1)) = 0
Source:http://stackoverflow.com/questions/3533320/sql-server-remove-end-string-character-0-from-data

2. modify the application. When obtaining SQL Server data, convert the data, which is the same as the first method.


How to export PostgreSQL data to SQL Server

Postgresql export can generate a cvs file using the copy command.
Ms SQL import is easy.

SQL server 2012 attach Database Error Solution

The source database file is a non-published version and cannot be attached.
Try backup/restoration.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.