Share details about the big data batch import methods and programming methods of MSSQL, MySql, and Oracle.
1: MSSQLSQL Syntax:
BULK INSERT [ database_name . [ schema_name ] . | schema_name . ] [ table_name | view_name ] FROM 'data_file' [ WITH ( [ [ , ] BATCHSIZE = batch_size ] [ [ , ] CHECK_CONSTRAINTS ] [ [ , ] CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ] [ [ , ] DATAFILETYPE = { 'char' | 'native'| 'widechar' | 'widenative' } ] [ [ , ] FIELDTERMINATOR = 'field_terminator' ] [ [ , ] FIRSTROW = first_row ] [ [ , ] FIRE_TRIGGERS ] [ [ , ] FORMATFILE = 'format_file_path' ] [ [ , ] KEEPIDENTITY ] [ [ , ] KEEPNULLS ] [ [ , ] KILOBYTES_PER_BATCH = kilobytes_per_batch ] [ [ , ] LASTROW = last_row ] [ [ , ] MAXERRORS = max_errors ] [ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ] [ [ , ] ROWS_PER_BATCH = rows_per_batch ] [ [ , ] ROWTERMINATOR = 'row_terminator' ] [ [ , ] TABLOCK ] [ [ , ] ERRORFILE = 'file_name' ] )]
SQL example:
Bulk insert TABLE name from 'd: \ mydata.txt 'with (fieldterminator =', ', rowterminator =' \ n', check_constraints) select * from Table Name
Since C # provides SqlBulkCopy, we will call more non-DBA programs:
C # code:
C # code call example and details. The following code is extracted from CYQ. Data:
using (SqlBulkCopy sbc = new SqlBulkCopy(con, (keepID ? SqlBulkCopyOptions.KeepIdentity : SqlBulkCopyOptions.Default) | SqlBulkCopyOptions.FireTriggers, sqlTran)) { sbc.BatchSize = 100000; sbc.DestinationTableName = SqlFormat.Keyword(mdt.TableName, DalType.MsSql); sbc.BulkCopyTimeout = AppConfig.DB.CommandTimeout; foreach (MCellStruct column in mdt.Columns) { sbc.ColumnMappings.Add(column.ColumnName, column.ColumnName); } sbc.WriteToServer(mdt); }
There are five details:
1: transaction:
For a single transaction, the constructor can be a link string.
If a transaction needs to be merged with the external transaction (such as deleting the transaction first and then inserting it, this is in the same transaction)
You need to construct the Connection object and Transaction on your own, and pass them in the context for processing.
2: whether insertion triggers
Use SqlBulkCopyOptions. FireTriggers to introduce
3: Others: Batch count, timeout, and whether to write the primary key ID.
Possible database Down:
In the course of history, I encountered a big pitfall:
When the data length is too long, the data field is too short, and binary data truncation is generated, the Database Service stops (maybe a special case, maybe not ).
Therefore, use it with caution and verify the data length of external data.
2: MySql
This is a sad story about MySql batch. There are several pitfalls that have not been found and solved until today.
SQL Syntax:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'data.txt' [REPLACE | IGNORE] INTO TABLE tbl_name [FIELDS [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char' ] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] [IGNORE number LINES] [(col_name_or_user_var,...)] [SET col_name = expr,...)]
Example:
LOAD DATA LOCAL INFILE 'C:\\Users\\cyq\\AppData\\Local\\Temp\\BulkCopy.csv' INTO TABLE `BulkCopy` CHARACTER SET utf8 FIELDS TERMINATED BY '$,$' LINES TERMINATED BY '' (`ID`,`Name`,`CreateTime`,`Sex`)
Although MySql. Data. dll provides MySqlBulkLoader, the source code only generates Load Data and runs it with ADO. NET,
The generate of core pitfalls *. csv data file is not provided, so you can generate and execute your own statements without using it.
C # code:
The following code is taken from CYQ. Data. It is a piece of code that was corrected today:
Private static string MDataTableToFile (MDataTable dt, bool keepID, DalType dalType) {string path = Path. getTempPath () + dt. tableName + ". csv "; using (StreamWriter sw = new StreamWriter (path, false, new UTF8Encoding (false) {MCellStruct MS; string value; foreach (MDataRow row in dt. rows) {for (int I = 0; I <dt. columns. count; I ++) {# region set value MS = dt. columns [I]; if (! KeepID & ms. isAutoIncrement) {continue;} else if (dalType = DalType. mySql & row [I]. isNull) {sw. write ("\ N"); // Mysql uses \ N to indicate null.} Else {value = row [I]. toString (); if (ms. sqlType = SqlDbType. bit) {int v = (value. toLower () = "true" | value = "1 ")? 1: 0; if (dalType = DalType. mySql) {byte [] B = new byte [1]; B [0] = (byte) v; value = System. text. encoding. UTF8.GetString (B); // mysql must be archived in bytes.} Else {value = v. toString () ;}} else {value = value. replace ("\\", "\\\\"); // handle escape characters} sw. write (value);} if (I! = Dt. columns. count-1) // output {sw if it is not the last one. write (AppConst. splitChar) ;}# endregion} sw. writeLine () ;}} if (Path. directorySeparatorChar = '\') {path = path. replace (@ "\", @ "\");} return path ;}
The above code is to generate a csv file for being called. There are two core pitfalls, which cost me a lot of time:
1: Bit type data cannot be imported?
2: The auto-increment ID of the 1st rows is reset to 1?
These two questions cannot be found on the Internet. If you indulge in these questions, you should solve them and then solve them.
The solution is as follows:
A: First Use Load Data OutFile to export A file and then use Load Data InFile to import the file.
At first I opened it with notepad and followed Ctrl + S. I found that the problem was the same as mine, And I suspected that it was not supported?
Until today, I re-export it. I did not read it in the middle. I imported it directly and found that it was actually normal. So I thought about it:
B: Compare the file generated by yourself with the file generated by the command in hexadecimal format. The result shows:
The data generated by the Bit type is 0, 1, and 30 and 31 in hexadecimal format.
The data generated by the command in hexadecimal format is 00 and 01. After checking the information, we found that the Bit archived by MySql is binary.
So we can convert to bytes, convert it to strings, and then archive it.
So this piece of code is generated (the DataTable to CSV code on the Internet is not processed, and they do not know how to run it. Do they not define the Bit type ?) :
If (ms. SqlType = SqlDbType. Bit) {int v = (value. ToLower () = "true" | value = "1 ")? 1: 0; if (dalType = DalType. mySql) {byte [] B = new byte [1]; B [0] = (byte) v; value = System. text. encoding. UTF8.GetString (B); // mysql must be archived in bytes. } Else {value = v. ToString ();}}
In addition, the value of Null is represented by \ N.
After the first problem is solved, the second problem is left. Why is the primary key of the first line of code set to 1?
Compared with the hexadecimal system, the results are astonishing:
It is the BOM header, which makes it incorrectly recognize the first primary key value. Therefore, the primary key is ignored and replaced with 1st auto-increment values.
This also explains why all the data that is retained has bugs.
Therefore, the solution is to avoid generating BOM headers during StreaWrite. How can this problem be solved?
The following code is available:
using (StreamWriter sw = new StreamWriter(path, false, new UTF8Encoding(false))){ ...................}
Use New Encoding and specify the parameter to false to replace our general System. Text. Encoding. UTF8Encoding.
These details are very confidential. You can't guess them...
3: OracleSQL syntax
LOAD[DATA][ { INFILE | INDDN } {file | * }[STREAM | RECORD | FIXED length [BLOCKSIZE size]|VARIABLE [length] ][ { BADFILE | BADDN } file ]{DISCARDS | DISCARDMAX} integr ][ {INDDN | INFILE} . . . ][ APPEND | REPLACE | INSERT ][RECLENT integer][ { CONCATENATE integer |CONTINUEIF { [THIS | NEXT] (start[: end])LAST }Operator { 'string' | X 'hex' } } ]INTO TABLE [user.]table[APPEND | REPLACE|INSERT][WHEN condition [AND condition]...][FIELDS [delimiter] ](column {RECNUM | CONSTANT value |SEQUENCE ( { integer | MAX |COUNT} [, increment] ) |[POSITION ( { start [end] | * [ + integer] }) ]datatype[TERMINATED [ BY ] {WHITESPACE| [X] 'character' } ][ [OPTIONALLY] ENCLOSE[BY] [X]'charcter'][NULLIF condition ][DEFAULTIF condotion]}[ ,...])
The above configuration is archived into a CTL file and then called by the following command:
Sqlldr userid = user name/password @ Database control = file name. ctl
C # Syntax:
. NET has three methods to operate Oracle:
1: System. Data. OracleClient (client needs to be installed) does not contain batch methods (x86 and x64 are also distinguished ).
2: Oracle. DataAccess (Client installation required) includes batch methods (x86 and x64 are also distinguished ).
3: Oracle. ManagedDataAccess (Client installation is not required) without batch methods (x86 and x64 are not distinguished, but only. NET 4.0 or later is supported)
The batch method of Oracle. DataAccess is OracleBulkCopy. Since the usage is almost the same as that of SqlBulkCopy, we will not introduce it.
If the Oracle client is installed on the server where the caller is located, call the following method:
The process is as follows:
1: generate the *. cvs data file. For details, see the MySql code.
2: Generate *. ctl control file, and archive the generated Load Data statement into a *. ctl file.
3: Use sqlidr.exe to execute the CTL file. Here, we cannot use ADO. NET or process to call the CTL file. Therefore, this batch can only be used separately.
Code for calling a process:
Bool hasSqlLoader = false; private bool HasSqlLoader () // checks whether the client is installed. {HasSqlLoader = false; Process proc = new Process (); proc. startInfo. fileName = "sqlldr"; proc. startInfo. createNoWindow = true; proc. startInfo. useShellExecute = false; proc. startInfo. redirectStandardOutput = true; proc. outputDataReceived + = new DataReceivedEventHandler (proc_OutputDataReceived); proc. start (); proc. beginOutputReadLine (); proc. waitForExit (); return hasSqlLoader;} void proc_OutputDat AReceived (object sender, DataReceivedEventArgs e) {if (! HasSqlLoader) {hasSqlLoader = e. Data. StartsWith ("SQL * Loader:") ;}// has been implemented, but there is no transaction, so do not introduce it for the moment. Private bool ExeSqlLoader (string arg) {try {Process proc = new Process (); proc. startInfo. fileName = "sqlldr"; proc. startInfo. arguments = arg; proc. start (); proc. waitForExit (); return true;} catch {} return false ;}Summary:
With the popularization of big data, batch migration between data must be involved more frequently. Therefore, whether it is using SQL scripts, writing code by yourself, or using DBImport tools, will become one of the essential skills!
In view of this, I would like to share with you the difficulties I have paid and filled in on this part for your reference!