SQL Server data migration to PostgreSQL error explanation and solution, serverpostgresql
Problem reproduction:
1. PG client:
postgres=# create table text_test (id int,info text);
CREATE TABLE
postgres=# insert into text_test values (1,E'\0x00');
ERROR: invalid byte sequence for encoding "UTF8": 0x00
2. SQL Server generates data
create table test_varchar(id int,name varchar(20));
insert into test_varchar values (1, 'name' + char(0));
insert into test_varchar values (1, 'name' + '');
Then, get the data through the java program and insert it to PG. The error message is also displayed:
invalid byte sequence for encoding "UTF8": 0x00
First, we think that an error that cannot be converted occurs when gb2312 is converted to UTF8. The result indicated that UTF8 is longer than 1-6 bytes. The encoding rules are as follows:
Bits |
Last code point |
Byte 1 |
Byte 2 |
Byte 3 |
Byte 4 |
Byte 5 |
Byte 6 |
7 |
U + 007F |
0 xxxxxxx |
|
|
|
|
|
11 |
U + 07FF |
110 xxxxx |
10 xxxxxx |
|
|
|
|
16 |
U + FFFF |
1110 xxxx |
10 xxxxxx |
10 xxxxxx |
|
|
|
21 |
U + 1 FFFFF |
11110xxx |
10 xxxxxx |
10 xxxxxx |
10 xxxxxx |
|
|
26 |
U + 3 FFFFFF |
111110xx |
10 xxxxxx |
10 xxxxxx |
10 xxxxxx |
10 xxxxxx |
|
31 |
U + 7 FFFFFFF |
1111110x |
10 xxxxxx |
10 xxxxxx |
10 xxxxxx |
10 xxxxxx |
10 xxxxxx |
While 0x00 complies with the UTF8 rules. This made us very surprised. Then we found two points and confirmed the problem:
1,
PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).
If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.
Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.
Source:http://stackoverflow.com/questions/1347646/postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0
2,
Terminating character |
Indicated |
Tab |
\ T This is the default field terminator. |
Newline character |
\ N This is the default row terminator. |
Carriage return/line feed |
\ R |
Backslash1 |
\\ |
Null terminator (nonvisible terminator) 2 |
\ 0 |
Any printable character (control characters are not printable, character t null, tab, newline, and carriage return) |
(*, A, t, l, and so on) |
String of up to 10 printable characters, including some or all of the terminators listed earlier |
(** \ T **, end ,!!!!!!!!!!, \ T-\ n, and so on) |
Source: http://msdn.microsoft.com/en-us/library/ms191485.aspx
Therefore, it is determined that pg's processing of null is different from that of SQL Server, so an error occurs here.
The specific PG code that causes this problem is as follows (src/backend/utils/mb/wchar. c pg_verify_mbstr_len ):
if (!IS_HIGHBIT_SET(*mbstr))
{
if (*mbstr != '\0')
{
mb_len++;
mbstr++;
len--;
continue;
}
if (noError)
return -1;
report_invalid_encoding(encoding, mbstr, len);
}
#define IS_HIGHBIT_SET(ch) ((unsigned char)(ch) & HIGHBIT)
#define HIGHBIT (0x80)
The report_invalid_encoding function returns the error message, that is
Invalid byte sequence for encoding "UTF8": 0x00
What really causes this problem is:
! IS_HIGHBIT_SET (* mbstr): When * mbstr is 0 X, the system determines whether * mbstr is \ 0. When * mbstr is \ 0, the system directly enters the report_invalid_encoding function to report an error.
The reason for this problem is that PG and SQL Server have different processing methods for null.
Solution:
1. Modify the SQL Server Source data,
UPDATE: This seems to work:
Select * from TABLE
where UNICODE(SUBSTRING(naughtyField, LEN(naughtyField), 1)) = 0
So:
Update TABLE
SET naughtyField = SUBSTRING(naughtyField, 1, LEN(naughtyField) - 1)
where UNICODE(SUBSTRING(naughtyField, LEN(naughtyField), 1)) = 0
Source:http://stackoverflow.com/questions/3533320/sql-server-remove-end-string-character-0-from-data
2. modify the application. When obtaining SQL Server data, convert the data, which is the same as the first method.
How to export PostgreSQL data to SQL Server
Postgresql export can generate a cvs file using the copy command.
Ms SQL import is easy.
SQL server 2012 attach Database Error Solution
The source database file is a non-published version and cannot be attached.
Try backup/restoration.