MySQL load data infile

Source: Internet
Author: User

The load data infile statement reads a table from a text file at a high speed. If you specify the local keyword, read the file from the client host. If local is not specified, the file must be on the server. (Local is available in MySQL 3.22.6 or a later version .)

For security reasons, when reading text files on the server, the files must be in the database directory or can be read by everyone. In addition, to use load data infile for files on the server, you must have the file permission on the server host.

First, use load data infile to write the insert statement to a file. Then, use load data infile to import the file to the database, therefore, a message similar to N insert into table_name ('id', 'name', 'content') values (1, "A", "ABC") is generated "), (2, "B", "ABC"), (3, "C", "ABC"); composed. SQL file. Then execute load data infile 'file _ name. SQL 'into table table_name; I found that the execution is always incorrect. I don't know what's going on, so I have to read the MySQL application manual in detail:

If you specify a fields clause, each of its clauses (terminated by, [optionally] enclosed by and escaped by) is optional, except that you must specify at least one of them.
if you do not specify a fields clause, the default value is the same as that written in the following way:
fields terminated by '\ t' enclosed by ''' escaped by' \ '
if you do not specify a lines clause, the default value is the same as if you write it like this:
lines terminated by '\ n'
In other words, the default value causes loading data infile to behave as follows:
find the line boundary at the line break
split the line into fields at the positioning character
do not expect the field to be enclosed by any quotation marks
it will start "\" is part of the literal character of the field value

I realized that the content in My SQL was not written according to the default setting of load data infile. For example, under the default setting, the fields in each insert statement are separated by tabs and the content is not enclosed in any quotation marks (that is, enclosed, however, I separate them by commas and enclose them with double quotation marks. It is no wonder that an error will occur.
therefore, the SQL statement is written as mysql_query ('Load data local infile "file_name. SQL" into Table 'table _ name' fields terminated ", "Enclosed by \ '" \ 'escaped by "\" lines terminated by "\ n" starting by ""');
the execution is successful, after successfully inserting many records into the specified data table, I found another problem. The inserted data is messy and the fields and content to be inserted do not meet my requirements, many of them inserted the field name into the field. Alas, I had to go back and read the usage of load data infile in MySQL again, and finally figured it out, it turns out that in file_name. SQL, you only need to write the content in a certain format, instead of writing the entire SQL Execution statement. It's really stupid! ^_^
so the content is changed to
1, "A", "ABC"
2, "B", "ABC"
3, "C", "ABC"
and the content must strictly match the number and sequence of fields in the data table
on the local server (I use the window host) the operation was successful. OK!
upload the Program to the network server (Linux host). Run the following command:
can't get stat ...... ''(Errcode: 13)
at first, I thought it was an SQL File Permission or MySQL Root User permission problem. Later I thought it was wrong. MySQL Root User was a Super User, you must have the permission. The problem lies in the SQL File Permission. Later, you can change the SQL permission to 777, but the operation will not work after execution.
I searched the internet and put the file in/var/lib/MySQL. Just try it, however, it is impossible for me to place the SQL file generation on the PHP webpage under/var/lib/MySQL. I tried my best and finally found a solution on the Internet:

Using local will be slower than allowing the server to directly access files, because the file content must be transmitted from the client host to the server host. On the other hand, you do not need the File Permission to load local files.
You can also use the mysqlimport utility to load data files, which are run by sending a load data infile command to the server. The -- local option enables mysqlimport to read data from the client host. If the client and server support the compression protocol, you can specify -- compress to achieve better performance on a slow network.

In fact, the method is very simple, that is, it is OK to write load data infile as load data local infile.

The following is a supplement:

Load data [low_priority | concurrent] [local] infile 'file_name.txt'
[Replace | ignore]
Into Table tbl_name
[Fields
[Terminated by 'string']
[[Optionally] enclosed by 'Char ']
[Escaped by 'Char ']
]
[Lines
[Starting by 'string']
[Terminated by 'string']
]
[Ignore number lines]
[(Col_name_or_user_var,...)]
[Set col_name = expr,...]

The load data infile statement reads rows from a text file to a table at a high speed. The file name must be a text string.

For more information about the efficiency of insert and load data infile and how to increase the speed of load data infile, refer to the accelerated insert statement in the administrator manual.

The character set specified by the system variable character_set_database is used to interpret information in the file. Set names and character_set_client do not affect the input interpretation.

You can also use the import utility to load data files. It is implemented by sending a load data infile command to the server. The -- local option enables import to read data files from the client host. If the client and server support the compression protocol, you can specify the -- compress option to achieve better performance in a slow network. See the import-data import program in the administrator manual.
If you specify the keyword low_priority, the execution of the load data statement will be delayed until no other client is reading the table.

If a gssys table meets the conditions for Simultaneous Insertion (that is, the table has idle blocks in the middle) And you specify concurrent for the gssys table, when load data is being executed, other threads will re-obtain data from the table. Even if no other thread is using this table at the same time, using this option will slightly affect the performance of load data.

If the local keyword is specified, it will explain the connected client:
• If local is specified, the client component on the client Host reads the file and sends it to the server. You can provide the complete path of the file to determine its exact location. If the relative path is given, the file name is relative to the directory where the client component was started.
• If local is not specified, the file is located on the server host and is directly read by the server.
When files are located from the server host, the server uses the following rules:
• If a complete path is given, the server uses this path name.
• If the relative path of one or more front components is specified, the server searches for files in a directory relative to the server.
• If a file name without a front component is specified, the server searches for files from the database directory of the current database.
Note:

These rules mean that the file given by '/gsfile.txt' is read from the data directory of the server. However, the file given by 'gsfile.txt 'is read from the data directory of the current database. For example, the following load data statement reads the 'data.txt 'file from the db1 Data Warehouse directory. Because db1 is the current database, even if the statement explicitly loads the file into the DB2 database table, it reads it from the db1 directory:

Sqlcli> Use db1;
Sqlcli> load data infile 'data.txt 'into Table db2.gs _ table;

Note:

When specifying a Windows path name, a diagonal line rather than a backslash is used. To use a backslash, you must double-write.
For security reasons, when reading text files on the server, the files must be in the database directory or can be read by all users. That is to say, when you execute load data infile for files on the server, you must obtain the File Permission.
Refer to the permissions provided by GBASE In the Administrator manual.

Compared with the direct access to files on the server, the local access speed is slightly slower, because the file content must be transmitted through a connection from the client to the server. On the other hand, you do not need to obtain the File Permission for local files.

Local can work only when both the server and client are allowed. For example, if -- local-infile = 0 when the kernel is started, local cannot work. See the load data local security questions in the administrator manual.

The replace and ignore keywords process input records that are repeated with existing primary key values.

If replace is specified, the input row replaces the existing row (that is, the row with the same primary index value as the existing row ). Refer to the replace syntax.

If ignore is specified, the input row that already has a primary key value is skipped. If either of them is not specified, the operation behavior depends on whether the local keyword is specified. If no local is specified, if duplicate key values are found, an error is generated and the rest of the text file is ignored. If local is specified, the default operation will be the same as the ignore; this is because the server cannot terminate file transfer during the operation.

If you want to ignore the foreign key constraints in the load operation, you can execute the set foreign_key_checks = 0 statement before running load data.
If you use load data infile on an empty gssys table, all non-unique indexes will be created in batches (like repair ). When there are many indexes, this usually makes load data infile faster. Normally, it is very fast, but there are also extreme cases. You can use alter table before loading the file .. disable keys close them and use alter table after loading the file .. enable keys re-indexing to accelerate index creation. See the accelerated insert statement in the administrator manual.

Load data infile is the inverse operation of select... into OUTFILE. See select syntax. Use Select... into OUTFILE to write data from a database to a file. Use load data infile to read files to the database. The syntax of the fields and lines clauses of the two commands is the same. Both clauses are optional, but if both clauses are specified at the same time, the fields clause must appear before the lines clause.
If you specify a fields clause, its clauses (terminated by, [optionally] enclosed by, and escaped by) are also optional. However, you must specify at least one of them.
If you do not specify a fields clause, the following statement is used for saving time:
Fields terminated by '\ t' enclosed by ''escaped '\\'
If you do not specify a lines clause, the following statements are used for saving time:
Lines terminated by '\ n' starting''
In other words, when reading input, the default load data infile is as follows:
• Find the line boundary at the line break.
• No row prefix is missing.
• Separate rows into fields at the tab.
• Fields are not considered to be enclosed by any quotation marks.
• A positioning character, line break, or '\' starting with "\" is interpreted as a text character of the field value.
On the contrary, when writing data to the output, the default value causes Select... into OUTFILE to behave as follows:
• Add a TAB between field values.
• Fields are enclosed without any quotation marks.
• Escape the location, line break, or '\' character instance that appears in the field value.
• Add a line break at the end of the row.
Note:

To write fields escaped by '\', you must specify two backslashes. This value is read as a backslash.
Note:

If it is a Windows text file, you may have to use lines terminated by '\ r \ n' to read the file, because Windows uses two characters as the line terminator. In some programs, \ r may be used as a line terminator when writing a file, such as a wordpad. Lines terminated by '\ R' is required to read such files '.
If all rows to be read have prefixes that you want to ignore, you can use lines starting by 'prefix _ string' to skip this prefix (and any content before it ). If a row does not have this prefix, the entire row is skipped. Note: prefix_string may be in the middle of the row!
For example:
Sqlcli> load data infile '/tmp/test.txt'
-> Into table test lines starting by "XXX ";

Use it to read files containing the following content:

XXX "row", 1
Something XXX "row", 2
Then we can get the data ("row", 1) and ("row", 2 ).

The ignore number lines option can be used to ignore the rows at the beginning of the file. For example, you can use ignore 1 lines to skip the first line containing the column Name:
Sqlcli> load data infile '/tmp/test.txt'
-> Into table test ignore 1 lines;
When you use select... into OUTFILE and load data infile write data from a database to a file, and then read it into the database from the file, the fields of the two commands and the row processing options must match. Otherwise, load data infile cannot correctly interpret the file content. Assume that you use Select... into OUTFILE to write data to a file using comma-separated fields:
Sqlcli> select * into OUTFILE 'data.txt'
-> Fields terminated ','
-> From Table2;
The correct statement should be:
Sqlcli> load data infile 'data.txt 'into Table Table2
-> Fields terminated ',';
If you try to read the file with the following statement, it will not work, because the command load data infile distinguishes the field value with a positioning character:
Sqlcli> load data infile 'data.txt 'into Table Table2
-> Fields terminated by '\ T ';
The possible result is that each input row is interpreted as a separate field.
Load data infile can also be used to read files obtained from external sources. For example, in a file in DBASE format, fields are separated by commas and surrounded by double quotation marks. If the row in the file is terminated with a line break, the following describes the fields and row processing options that the user will use to load the file:
Sqlcli> load data infile 'data.txt 'into Table tbl_name
-> Fields terminated by ', 'enabledby '"'
-> Lines terminated by '\ n ';

You can specify an empty string ('') for any field or row processing option (''). If it is not empty, the values of fields [optionally] enclosed by and fields escaped by must be a single character. The fields terminated by and lines terminated by values can exceed one character. For example, to write a row terminated by a carriage return or read a file containing such a row, specify a line terminated by '\ r \ n' clause.
Fields [optionally] enclosed by control field reference. For output (Select... into OUTFILE), If you omit the word optionally, all fields are surrounded by the enclosed by characters. An example of such an output file (using a comma as the field separator) is as follows:
"1", "a string", "100.20"
"2", "a string containing a, comma", "102.20"
"3", "a string containing a \" quote "," 102.20"
"4", "a string containing a \", quote and comma "," 102.20"

If you specify optionally, the enclosed by character is only used to wrap fields that contain string types (such as char, binary, text, or enum ):
1, "a string", 100.20
2, "a string containing a, comma", 102.20
3, "a string containing a \" quote ", 102.20
4, "a string containing a \", quote and comma ", 102.20
Note:

The enclosed by character in a field value is escaped by using the escaped by character as its prefix. Note that if you specify an empty escaped by value, an output file that cannot be correctly read by load data infile may be generated. For example, if the escape character is empty, the output shown above will be the output shown below. Note that the second field in the fourth row contains a comma following a quotation mark, which looks like the termination of a field:

1, "a string", 100.20
2, "a string containing a, comma", 102.20
3, "a string containing a" quote ", 102.20
4, "a string containing a", quote and comma ", 102.20
For input, if the enclosed by character exists, it will be stripped from the end of the field value. (Whether optionally is specified or not. Optionally does not affect the input interpretation .) If the escaped by character exists before the enclosed by character, it is interpreted as part of the current field value.

If the field starts with the enclosed character, as long as it is followed by the field or the line terminated by sequence, this character instance is considered to be used to terminate a field value. To make it clear that if the enclosed by character is used in a field, you can repeat the character twice, and they will be interpreted as a single enclosed by character. For example, if enclosed by '"' is specified, the quotation marks are processed as follows:
"The" "big" "boss"-> the "big" boss
The "big" boss-> the "big" boss
The "" big "" boss-> the "" big "" boss

Fields escaped by controls how to write or read special characters. If the fields escaped by character is not empty, it will be used as the prefix of the following output characters:

• Fields escaped by character
• Fields [optionally] enclosed by character.
• The first character of fields terminated by and lines terminated by values.
• ASCII 0 (in fact, ASCII '0' is written after the escape character, rather than a zero-value byte ).
If the fields escaped by character is null, no characters are escaped and the null value is still output as null instead of \ n. Specifying an empty escape character may not be a good method, especially when the user's data field value contains any character in the list.
For the input value, if the fields escaped by character is not a null character, this character is stripped and the following characters are used as part of the field value. The exception is that the escaped '0' or 'N' (for example, \ 0 or \ n, and the escape character is '\'). These sequences are understood as ascii nul (a zero-value byte) and null. Rules for null processing are described at the back of this section.
For more "\" Escape syntax information, view the text value.
In some cases, field and row processing interact:
• If lines terminated by is an empty string and fields terminated by is not empty, each row ends with fields terminated.
• If the values of fields terminated by and fields enclosed by are both null (''), a fixed row (with no delimiters) format will be used. When fixed line format is used, no separator is used between fields (but the user still has a line terminator ). Write and read Column values using the "display" width of the column. For example, if a column is defined as int (7), the column value is written with a field of 7 characters. For input, the column value is obtained by reading 7 characters.
Lines terminated by is still used to separate rows. If a row does not contain all fields, the remaining parts of the column are set to their default values. If you do not have a line terminator, you should set it ''. In this case, the text file must contain all fields in each line.
The fixed row format also affects the processing of null values. See the following. Note that if you are using a multi-Byte Character Set, the fixed length format will not work.
There are many null values processed, depending on the fields and lines options used by the user:
• For the default fields and lines values, during output, null is written as \ n. When read, \ n is read as null (assume that the escaped by character is "\").
• If fields enclosed by is not a null value, the fields containing null words are read as null values. This is different from null, which is surrounded by fields enclosed by characters. The word is read as the string 'null.
• If fields escaped by is empty, the null value is written as the word null.
• When the regular row format is used (when fields terminated by and fields enclosed by are both null), null is written as an empty string. Note: When a file is written, the null value and null string in the table cannot be identified because both are written as null strings. If you need to distinguish the two when reading the file and returning it, you should not use a fixed row format.
Some situations that cannot be supported by load data infile:
• Fixed-size record rows (fields terminated by and fields enclosed by are empty) and blob or text columns.
• If you specify a separator that is the same as the other, or a prefix of the other, load data infile may not properly interpret the input. For example, the following fields clause may cause problems:
Fields terminated by '"'enabledby '"'

• If fields escaped by is null and a field value contains the value of fields enclosed by or line terminated by followed by fields terminated, this will cause the load data infile to stop reading a field or a row too early. This is because load data infile does not know where the field or row value ends.
The following example loads all columns in the persondata table:
Sqlcli> load data infile 'persondata.txt 'into Table persondata;
By default, when a field column is not provided after the load data infile statement, load data infile considers the input row to contain all fields in the table column. If you want to load certain columns in the table, specify a Field List:
Sqlcli> load data infile 'persondata.txt'
-> Into Table persondata (col1, col2 ,...);
If the field order of the input file is different from that of the columns in the table, you must specify a field list. Otherwise, GBASE does not know how to match the input field with the column in the table.
The column list can contain column names or user variables, and supports the set clause. This allows you to assign values to user variables using input values and convert these values before assigning the results to the columns.
User variables in the set clause have multiple purposes. In the following example, the first column of the data file is directly used as the value of t1.column1, and the second column is assigned to a user variable. This variable performs a division operation before being used as the value of t2.column2.
Load data infile 'file.txt'
Into Table T1
(Column1, @ var1)
Set column2 = @ var1/100;
The set clause can provide values that are not from the input file. The following statement sets column3 to the current date and time:
Load data infile 'file.txt'
Into Table T1
(Column1, column2)
Set column3 = current_timestamp;
By assigning an input value to a user variable, You can discard this input value without assigning this value to a column in the table:
Load data infile 'file.txt'
Into Table T1
(Column1, @ dummy, column2, @ dummy, column3 );

The use of the column/Variable list and set clause is subject to the following restrictions:

• The value assignment column in the set clause must be on the left of the value assignment operator.
• In the set value assignment statement, subqueries can be used. This query returns a value that will be assigned to the column. It may be just a scalar query. You cannot use subqueries to query the tables to be imported.
• For the column/Variable list or set clause, ignore rows because the ignore clause is used will not be processed.
• Because user variables do not display width, user variables cannot be used when the imported data is in fixed row format.
When processing Input rows, load data divides the rows into fields. If the column/Variable list and set clause are provided, these values will be used according to it. Then the row is inserted into the table. If this table has before insert or after insert triggers, they are activated before and after the inserted rows.

If there are too many fields in the input record line, extra fields will be ignored and the number of warnings will be increased.
If there are few fields in an input row, columns without input fields are set to the default value. The default value assignment is described in the create table syntax.

An empty field value is different from the loss of the field value:

• For the string type, the column is set to a null string.
• For the numeric type, the column is set to 0.
• For the date and time types, the column is set to the "zero" value suitable for the column type. Refer to date and time types.

If an empty string is explicitly assigned to a string, number, date, or time type in an insert or update statement, the user will get the same result as above.
The timestamp column is set to the current date and time only in two cases. One case is that the column has a null value (\ n); the other case is (only for the first timestamp column). When a field list is specified, the timestamp column is omitted from the field list.

Load data infile considers all input strings. Therefore, you cannot set numeric values for Enum or set columns in the form of an insert statement. All Enum and set values must be specified as strings!

When the load data infile query ends, it returns an information string in the following format:
Records: 1 deleted: 0 skipped: 0 Warnings: 0

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.