MySQL's Load data infile

Source: Internet
Author: User
Tags file permissions

The LOAD DATA infile statement reads from a text file into a table at a high speed. If you specify the local keyword, read the file from the client host. If local is not specified, the file must be located on the server. (Local is available in MySQL3.22.6 or later versions.) )

For security reasons, when reading a text file located on the server, the file must be in the database directory or read by everyone. Also, in order to use the load DATA INFILE on the files on the server, you must have file permissions on the server host.

The first time with load data infile, thought just to write the INSERT statement into a file, and then use load data infile the file into the database is OK, and generated a content similar to the same as the N-sentence INSERT INTO table_name (' ID ', ' Name ', ' content ') VALUES (1, "A", "abc"), (2, "B", "ABC"), (3, "C", "abc"), composed of a. sql file. Then execute LOAD DATA INFILE ' file_name.sql ' into TABLE table_name in PHP; Find out that always executes the error, really dizzy, do not know how, had to read the MySQL application manual in detail:

If you specify a fields clause, each of its clauses (TERMINATED by, [optionally] enclosed by and escaped by) is also optional, except that you must specify at least one of them.
If you do not specify a fields clause, the default value is the same as if you write:
Fields TERMINATED by ' \ t ' enclosed by ' \ escaped by ' \ \ '
If you do not specify a lines clause, the default value is the same as if you write:
LINES TERMINATED by ' \ n '
In other words, the default value causes the input to be read, and the LOAD DATA infile behaves as follows:
Look for row boundaries at line breaks
To divide rows into fields at a locator
Do not expect fields to be encapsulated by any quotation mark characters
A locator, line feed, or "\" that starts with "\" is interpreted as part of the literal character of the field value

Only to know that my SQL content is not written according to the default settings of load data infile, for example, the default setting, the fields in each sentence INSERT statement are separated by tabs and the contents are not enclosed in any quotation marks (that is, surrounded), but my is separated by commas and enclosed in double quotation marks, No wonder the error is performed.
The SQL execution statement is then written as mysql_query (' Load data local infile ' file_name.sql "into table ' table_name ' fields TERMINATED by", "enclosed by \ ' "\ ' escaped by" \\\ "LINES TERMINATED by" \ n "Starting by" "');
This is a successful execution, successfully inserted a number of records into the specified data table, but found a problem, the inserted data is messy, the field and the content to be inserted does not meet my requirements, and there are a lot of the field name is inserted into the field, alas, Had to go back and read the MySQL load data infile usage, finally figured out, the original file_name.sql inside only need to press a certain format to write the content in the line, not the entire SQL execution statements are written in, really stupid! ^_^
So, the content was replaced
1, "A", "ABC"
2, "B", "ABC"
3, "C", "ABC"
And the content should be closely related to the fields in the data table, both in quantity and in order.
On the local server (I use the window host) on the test, OK, the operation was successful!
Then upload the program to the network server (Linux host), one to execute, prompt:
Can ' t get stat of ... ' (errcode:13)
Initially thought is the SQL file permissions or the root user rights of the MySQL problem, and then think wrong ah, MySQL root user is superuser, must have permissions, the problem is in the SQL file permissions, and then change the permissions of SQL to 777, after the operation or not.
Online search, there is said to put the file in the/var/lib/mysql in the line, a try, really, but I can not in the PHP page SQL file generation put into the/var/lib/mysql, racking up the mind, finally finally found a solution on the Internet:

Using local will be slower than allowing the server to access files directly, because the contents of the file must be transferred from the client host to the server host. On the other hand, you do not need file permissions to load local files.
You can also load a data file using the Mysqlimport utility, which is operated by sending a load data infile command to the server. The--local option allows Mysqlimport to read data from the client host. If the client and server support the compression protocol, you can specify--compress to get better performance on slower networks.

In fact, the method is very simple, that is, load data infile written as load data local infile OK.

here is the content supplement:

LOAD DATA [Low_priority | CONCURRENT] [LOCAL] INFILE ' file_name.txt '
[REPLACE | IGNORE]
Into TABLE tbl_name
[Fields
[TERMINATED by ' string ']
[[optionally] enclosed by ' char ']
[Escaped by ' char ']
]
[LINES
[Starting by ' string ']
[TERMINATED by ' string ']
]
[IGNORE number LINES]
[(Col_name_or_user_var,...)]
[SET col_name = expr,...]]

The LOAD DATA INFILE statement reads rows from a text file into a table at a high speed. The file name must be a literal string.

For more information on the efficiency of the insert and load data INFILE and the speed of the load data INFILE, refer to the accelerated INSERT statement in the Administrator's Manual.

The character set indicated by the system variable character_set_database is used to interpret the information in the file. Set NAMES and set character_set_client do not affect the interpretation of the input.

Users can also load data files using the Import utility, which is implemented by sending a load data INFILE command to the server. The--local option allows import to read data files from the client host. If the client and server support the compression protocol, the user can specify the--compress option to achieve better performance on slower networks. Refer to the Import-data importer in the administrator's Manual.
If the user specifies the keyword low_priority,load the execution of the DATA statement will be deferred until no other client is reading the table.

If a gssys table satisfies a condition that is inserted at the same time (that is, the table has free blocks in the middle), and you specify concurrent for the Gssys table, other threads retrieve the data from the table when load data is executing. Even if no other thread is using this table at the same time, using this option will slightly affect the performance of the load DATA.

If the local keyword is specified, it interprets the connected client:
• If local is specified, the client component on the client host reads the file and sends it to the server. You can give the full path to the file to determine its exact location. If a relative path is given, the file name is relative to the directory where the client component was started.
• If no local is specified, the file is located on the server's host and is read directly by the server.
When you locate a file from a server host, the server uses the following rules:
• If a full path is given, the server uses that pathname.
• If given a relative path to one or more predecessor artifacts, the server searches for files in the relative server's data directory.
• If given a file name that does not have a predecessor, the server searches for the file from the database directory of the current database.
Attention:

These rules imply that a file given as '/gsfile.txt ' is read from the server's data directory, however, a file given with ' Gsfile.txt ' is read from the data directory of the current database. For example, the following load DATA statement reads the file ' Data.txt ' from the DB1 database directory, because DB1 is the current database, even if the statement explicitly loads the file into the table in the DB2 database, it is also read from the DB1 directory:

sqlcli> use DB1;
sqlcli> LOAD DATA INFILE ' data.txt ' into TABLE db2.gs_table;

Attention:

When you specify a Windows pathname, you use a slash instead of a backslash. If you want to use a backslash, you must double-write.
For security reasons, when reading a text file located on the server, the file must be located in the database directory or can be read by all users. That is, when you execute load DATA INFILE on a file on the server, the user must obtain file permissions.
Refer to the permissions provided by Gbase in the Administrator's Manual.

The local speed is slightly slower than the server directly accessing the file because the contents of the file must be routed through a client-to-server connection. On the other hand, for local files, you do not need to obtain file permissions.

LOCAL can only work if both the server and the client are allowed. For example, if kernel is started, the local does not work--local-infile=0. Refer to the Load DATA LOCAL security issue in the Administrator's Manual.

The REPLACE and ignore keywords handle input records that duplicate the primary key value that already exists.

If replace is specified, the input row replaces the existing row (that is, the row with the same primary index value as the existing line). Refer to replace syntax.

If ignore is specified, the input rows that are duplicated with the existing row primary key values are skipped. If you do not specify either of these, the action behavior depends on whether the local keyword is specified. No local is specified, and if duplicate key values are found, an error is generated and the remainder of the text file is ignored. If local is specified, the default action behavior is the same as specifying ignore, because during the operation, the server has no way to terminate the transfer of the file.

If you want the foreign KEY constraint to be ignored in the mount operation, you can execute the SET FOREIGN_KEY_CHECKS=0 statement before you execute load DATA.
If the user uses load DATA INFILE on an empty Gssys table, all non-unique indexes are created in batches (like repair). This can usually make the load DATA INFILE faster when there are many indexes. Normally very fast, but there are also extreme cases where users can use alter TABLE before loading a file. DISABLE KEYS close them and use alter TABLE after loading the file. ENABLE KEYS to rebuild the index, which accelerates index creation. Refer to the accelerated INSERT statement in the Administrator's Manual.

LOAD DATA INFILE is select ... into OUTFILE counter operation. Refer to select syntax. Use SELECT ... Into OUTFILE writes data from a database to a file. Use load DATA INFILE to read the file into the database. The syntax for the fields and lines clauses of the two commands is the same. Two clauses are optional, but if two are specified at the same time, the fields clause must appear before the lines clause.
If the user specifies a fields clause, its clauses (TERMINATED by, [optionally] enclosed by and escaped by) are also optional, but the user must specify at least one of them.
If the user does not specify a fields clause, the default is as if the following statement is used:
Fields TERMINATED by ' \ t ' enclosed by ' \ escaped by ' \ \ '
If the user does not specify a lines clause, the default is as if the following statement is used:
LINES TERMINATED by ' \ n ' starting by '
In other words, when the input is read, the default load DATA INFILE behaves as follows:
• Look for line boundaries at line breaks.
• Do not omit any line prefixes.
• Separate rows into fields at tabs.
• The field is not considered to be encapsulated by any quotation mark characters.
• A literal character that interprets a locator, newline character, or ' \ ' that begins with "\" as a field value.
Conversely, when the output is written, the default value causes the Select ... into OUTFILE behaves as follows:
• Add tabs between field values.
• Encapsulate a field without any quotation mark characters.
• Use "\" to escape the locator, line feed, or ' \ ' character instances that appear in the field values.
• Add a newline character at the end of the line.
Attention:

In order to write the fields escaped by ' \ \ ', the user must specify two backslashes, which will be read in as a backslash.
Attention:

If you are a text file for a Windows system, you may have to use lines TERMINATED by ' \ r \ n ' to read the file because the Windows system is characterized by using two characters as the line terminator. In some programs, you might use \ r as a line terminator, such as WordPad, when writing a file. When reading this type of file, you need to use lines TERMINATED by ' \ R '.
If all the rows to be read have prefixes that the user wishes to ignore, you can use lines starting by ' prefix_string ' to skip the prefix (and anything in front of it). If a row does not have the prefix, the entire row is skipped. Note that prefix_string may be in the middle of the line!
For example:
sqlcli> LOAD DATA INFILE '/tmp/test.txt '
To TABLE test LINES starting by "XXX";

Use it to read a file that contains the following:

xxx "Row", 1
something xxx "Row", 2
You can get the data ("row", 1) and ("Row", 2).

IGNORE number LINES This option can be used to ignore the line at the beginning of the file. For example, you can use ignore 1 LINES to skip the header row that contains the column name:
sqlcli> LOAD DATA INFILE '/tmp/test.txt '
into TABLE test IGNORE 1 LINES;
When a user uses Select ... into OUTFILE and load data INFILE to write data from a database into a file, and then read it into the database from a file, the field and row processing options for the two commands must match. Otherwise, LOAD DATA INFILE will not be able to interpret the file contents correctly. Assume that the user uses Select ... into OUTFILE writes data to a file in a comma-delimited field:
Sqlcli> SELECT * into OUTFILE ' data.txt '
--TERMINATED by ', '
from Table2;
In order for a comma-delimited file to be read back, the correct statement should be:
sqlcli> LOAD DATA INFILE ' data.txt ' into TABLE table2
---TERMINATED by ', ';
If the user tries to read the file with the statement shown below, it will not work because the command load DATA INFILE distinguishes the field value with the locator:
sqlcli> LOAD DATA INFILE ' data.txt ' into TABLE table2
--Fields TERMINATED by ' \ t ';
The possible result is that each input row is interpreted as a separate field.
The LOAD DATA INFILE can also be used to read files obtained from external sources. For example, in DBASE format, the fields are separated by commas and surrounded by double quotation marks. If the lines in the file are terminated with a newline character, the following shows the fields and row processing options that the user will use to mount the file:
sqlcli> LOAD DATA INFILE ' data.txt ' into TABLE tbl_name
--TERMINATED by ', ' enclosed by ' "'
-LINES TERMINATED by ' \ n ';

Any field and row handling options can specify an empty string ('). If not empty, the fields [optionally] enclosed by and fields escaped by value must be a single character. Fields TERMINATED by and lines TERMINATED by value can exceed one character. For example, to write a line terminated by a carriage return newline, or to read a file containing such a line, you should specify a lines TERMINATED by ' \ r \ n ' clause.
fields [optionally] enclosed a reference to the control field. For output (SELECT ... Into OUTFILE), if the user omits the word optionally, all fields are surrounded by the enclosed by character. An example of such an output file (with a comma as the field delimiter) is as follows:
"1", "a string", "100.20"
"2", "a string containing a, comma", "102.20"
"3", "a string containing a \" quote "," 102.20 "
"4", "a string containing a \", Quote and comma "," 102.20 "

If the user specifies that the optionally,enclosed by character is used only to wrap a field such as a string type (such as Char,binary,text or enum):
1, "A string", 100.20
2, "A string containing a, comma", 102.20
3, "a string containing a \" quote ", 102.20
4, "a string containing a \", Quote and comma ", 102.20
Attention:

The enclosed by character that appears in a field value is escaped by using the escaped by character as its prefix. Also note that if the user specifies an empty escaped by value, an output file that cannot be read correctly by load DATA INFILE may be generated. For example, if the escape character is empty, the output shown above becomes the output shown below. Notice the second field in line fourth, which contains a comma followed by a quotation mark, which looks like the end of a field:

1, "A string", 100.20
2, "A string containing a, comma", 102.20
3, "A string containing a" quote ", 102.20
4, "A string containing a", quote and comma ", 102.20
For input, the enclosed by character, if present, is stripped from the end of the field value. (This is true regardless of whether optionally is specified, and optionally does not affect the input interpretation.) If the escaped by character exists before the enclosed by character, it is interpreted as part of the current field value.

If the field starts with the enclosed character, as long as the field or row terminated by sequence is followed, the character instance is considered to be used to terminate a field value. To be clear, if you want to use the enclosed by character in a field, you can repeat the character two times, then they will be interpreted as a single enclosed by character processing. For example, if you specify enclosed by ' ", the quotation marks are handled as follows:
"The" "Big" "Boss", "big" boss
The "big" boss--the "big" boss
The "big" "Boss" and "Big" "Boss"

The fields escaped by controls how special characters are written or read. If the fields escaped by character is not empty, it will be used as a prefix for the following output characters:

Fields escaped by character
fields [optionally] enclosed by character.
Fields The first character of the TERMINATED by and lines TERMINATED by value.
ASCII 0 (actually writes ASCII ' 0 ' instead of a 0-value byte after escaping the character).
If the fields escaped by character is empty, no characters are escaped and the null value is still output as null instead of \ n. Specifying an empty escape character may not be a good method, especially if the user's data field value contains any characters from the list just before.
For input values, if the field escaped by character is not a null character, the occurrence of this character is stripped, and the following characters are used as part of the value of the fields. The exception is an escaped ' 0 ' or ' n ' (for example, \ n or \ r), at which point the escape character is ' \ '). These sequences are interpreted as ASCII NUL (a 0 value byte) and null. The rules for null processing are described at the back of this section.
For more "\" Escaped syntax information, view literal values.
In some cases, fields interact with row processing:
• If the lines TERMINATED by is an empty string, the fields TERMINATED by is non-empty, the rows end with the fields TERMINATED by.
• A fixed line (no delimiter) format will be used if the fields TERMINATED by and fields enclosed by values are empty ('). When using a fixed row format, no delimiter is used between the fields (but the user still has a row terminator). The column values are written and read using the display width of the column. For example, if a column is defined as int (7), the value of the column will be written using a 7-character field. For input, the column value is obtained by reading 7 characters.
LINES TERMINATED by is still used to detach rows. If a row does not contain all the fields, the remainder of the column is set to their default values. If the user does not have a row terminator, the user should set it to '. In this case, the text file must contain all the fields for each line.
The fixed row format also affects the handling of null values; see below. Note that if the user is using a multibyte character set, the fixed-length format will not work.
There are many processing of NULL values, depending on the fields and lines options used by the user:
• For the default fields and lines values, NULL is written to \ n when output, and \ n is read as null when read-in (assuming the escaped by character is "\").
• If field enclosed by is not a null value, the fields containing the literal word null value are read as null values. This is different from the word null surrounded by the fields enclosed by character. The word is read as a string ' NULL '.
• If the fields escaped by is empty, the null value is written as the word null.
• When using a fixed line format (when both fields TERMINATED by and fields enclosed by are null values), NULL is written as an empty string. Note that this causes null values and empty strings in the table to be indistinguishable when the file is written, because both are written as empty strings. If the user needs to be able to distinguish both when reading the file and returning, the user should not use a fixed row format.
Some cases that cannot be supported by the load DATA INFILE:
• Fixed-size record lines (Fields TERMINATED by and fields enclosed by are empty) and blob or text columns.
• If the user specifies a delimiter that is the same as another, or another prefix, LOAD DATA INFILE may not interpret the input correctly. For example, the following fields clause will cause problems:
Fields TERMINATED by ' "' enclosed by '" '

• If fields escaped by is empty, a field value contains the value of either the enclosed by or the lines TERMINATED by followed by the values of the domain TERMINATED by, which will cause load DATA infi LE prematurely stops reading a field or row. This is because load DATA INFILE does not know where the field or row value ends.
The following example loads all the columns of the Persondata table:
sqlcli> LOAD DATA INFILE ' persondata.txt ' into TABLE persondata;
By default, when a field column is not provided after the load data INFILE statement, load data INFILE considers the input row to contain all the fields in the table column. If the user wants to load some of the columns in the table, specify a field list:
sqlcli> LOAD DATA INFILE ' Persondata.txt '
Into TABLE persondata (col1,col2,...);
If the field order of the input files differs from the order of the columns in the table, the user must also specify a field list. Otherwise, Gbase does not know how to match the input field to the columns in the table.
The column list can contain either a name or a user variable, and a SET clause is supported. This allows you to assign values to user variables with input values and transform them before assigning the results to the columns.
User variables in the SET clause have multiple uses. The following example takes the first column of the data file directly as the value of the T1.COLUMN1 and assigns the second column to a user variable that takes a division operation before the value as T2.column2
LOAD DATA INFILE ' file.txt '
into TABLE t1
(Column1, @var1)
SET column2 = @var1/100;
The SET clause can provide a value that is not derived from the input file. The following statement sets Column3 to the current date and time:
LOAD DATA INFILE ' file.txt '
into TABLE t1
(Column1, Column2)
SET column3 = Current_timestamp;
By assigning an input value to a user variable, you can discard the input value and not give the value to a column in the table:
LOAD DATA INFILE ' file.txt '
into TABLE t1
(Column1, @dummy, Column2, @dummy, Column3);

Use the column/variable list and the SET clause to be restricted:

The assignment column names in the set clause should only be on the left side of the assignment operator.
• In Set assignment statements, subqueries can be used. This query returns a value that will be assigned to the column, which may be just a scalar query. You cannot use a subquery to query the table that will be imported.
• For a column/variable list or SET clause, rows that are ignored because the IGNORE clause is used will not be processed.
• User variables cannot be used when the imported data is in a row-invariant format because the user variable does not display a width.
When the input row is processed, LOAD DATA divides the rows into fields, and if a column/variable list and a SET clause are provided, the values are used according to it. The resulting row is then inserted into the table. If this table has before insert or after insert triggers, they are activated before and after the row is inserted.

If you enter too many fields for the record line, the extra fields are ignored and the number of warnings is increased.
If one input row has fewer fields, the column with no input field will be set to the default value. The default value assignment is described in the Create TABLE syntax.

An empty field value differs from the interpretation of missing field values:

• For string types, the column is set to an empty string.
• For numeric types, the column is set to 0.
• For date and time types, the column is set to a value of "0" that is appropriate for the column type. Reference date and Time type.

If you explicitly assign an empty string to a string, number, or date or time type in an INSERT or UPDATE statement, the user will get the same result as above.
Only in two cases the timestamp column is set to the current date and time. One case when the column has a null value (that is, \ n), the other is (for the first TIMESTAMP column only), and when a field manifest is specified, the TIMESTAMP column is omitted from the field list.

LOAD DATA INFILE that all input is a string, and therefore, for an enum or set column, the user cannot set a numeric value for it in the form of an INSERT statement. All enum and set must be specified as a string!

When the load DATA INFILE query ends, it returns a string of information in the following format:
Records:1 deleted:0 skipped:0 warnings:0

Http://www.jb51.net/article/18234.htm

MySQL's Load data infile

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.