Share the Big Data batch import method and programming details of MSSQL, MYSQL, and Oracle

Source: Internet
Author: User
Tags bulk insert mssql

1:mssqlsql Grammar article:
bulk INSERT [database_name. [Schema_name]. | Schema_name.  ] [table_name | view_name] FROM ' data_file ' [with ([[,] batchsize = Batch_size] [[ ,] check_constraints] [[,] CODEPAGE = {' ACP ' | ' OEM ' | ' RAW ' | ' Code_page '}] [[,] datafiletype = {' char ' | ' Native ' | ' Widechar ' | ' Widenative '}] [[,] fieldterminator = ' field_terminator '] [[,] firstrow = First_row] [[,] Fire  _triggers] [[,] formatfile = ' Format_file_path '] [[,] keepidentity] [[,] keepnulls] [[,      ] Kilobytes_per_batch = Kilobytes_per_batch] [[,] lastrow = Last_row] [[,] maxerrors = max_errors] [[,] ORDER ({column [ASC |      DESC]} [,... n])] [[,] rows_per_batch = Rows_per_batch] [[,] rowterminator = ' row_terminator ']      [[,] TABLOCK]   [[,] errorfile = ' file_name '])] 

SQL Example:

Bulk Insert Table name from  ' D:\mydata.txt ' with  (fieldterminator= ', ',  rowterminator= ' \ n ',  

Because C # provides sqlbulkcopy, we, the non-DBA, are more likely to call through programs:

C # code article:

C # code calls examples and details, and the following code extracts from Cyq.data:

using(SqlBulkCopy SBC =NewSqlBulkCopy (Con, keepid? SqlBulkCopyOptions.KeepIdentity:SqlBulkCopyOptions.Default) |sqlbulkcopyoptions.firetriggers, Sqltran)) {SBC. BatchSize=100000; Sbc. DestinationTableName=Sqlformat.keyword (MDT.                        TableName, Daltype.mssql); Sbc. Bulkcopytimeout=AppConfig.DB.CommandTimeout; foreach(mcellstruct columninchMDT. Columns) {SBC. Columnmappings.add (column. ColumnName, column.                        ColumnName); } SBC.                    WriteToServer (MDT); }

There are 5 details:

1: Business:

If it is just a single transaction, the constructor can be a link string.

If you need to synthesize a transaction with an external one (such as deleting first, then inserting, which is in the same transaction)

You need to construct connection objects and transaction, which are passed in the context.

2: Insert whether trigger is raised

Introduced through Sqlbulkcopyoptions.firetriggers

3: Other: Batch number, time-out, whether to write the master key ID.

Scenarios in which database down may be raised:

In the course of history, one of the great pits I have encountered is:

When the length of the data is too long, the fields of the data are too short, and the data binary truncation occurs, the database service actually stops (perhaps a special case, perhaps not).

So use caution and try to do a good job of data length verification for external data.

2:mysql

On the MySQL batch, this is a sad reminder of the past, there are several pits, until today, only to find and solve.

SQL Syntax article:
LOAD DATA [Low_priority | CONCURRENT] [LOCAL] INFILE ' data.txt '    [REPLACE | IGNORE] into    TABLE tbl_name    [Fields        [TERMINATED by ' string ']        [[optionally] enclosed by ' char ']        [ Escaped by ' char ']    [LINES [Starting by ' string '] [        TERMINATED by        ' string ']    [IGNORE Number LINES]    [(Col_name_or_user_var,...)]    [SET col_name = expr,...)]

Example article:

LOAD DATA LOCAL INFILE ' c:\\users\\cyq\\appdata\\local\\temp\\bulkcopy.csv ' into TABLE ' bulkcopy ' CHARACTER SET UTF8 FIEL DS TERMINATED by ' $,$ ' LINES TERMINATED by ' (' ID ', ' Name ', ' createtime ', ' Sex ')

Although MySql.Data.dll provided the Mysqlbulkloader, but look at the source code just generated a load Data and with the ADO execution,

The core pits generated *.csv data files are not provided, so they generate statements and execute just fine, do not need to use it.

C # code article:

The following code, excerpted from Cyq.data, is a code that was revised today:

 Private Static stringMdatatabletofile (mdatatable DT,BOOLkeepid, Daltype daltype) {            stringPath = Path.gettemppath () + dt. TableName +". csv"; using(StreamWriter SW =NewStreamWriter (Path,false,NewUTF8Encoding (false)) {mcellstruct ms; stringvalue; foreach(Mdatarow rowinchdt. Rows) { for(inti =0; i < dt. Columns.count; i++)                    {                        #regionSetting the valueMs=dt.                        Columns[i]; if(!keepid &&Ms. Isautoincrement) {Continue; }                        Else if(Daltype = = Daltype.mysql &&Row[i]. IsNull) {SW. Write ("\\n");//MySQL uses \ n to represent a null value.                         }                        Else{Value=Row[i].                            ToString (); if(Ms. SqlType = =sqldbtype.bit) {intv = (value. ToLower () = ="true"|| Value = ="1") ?1:0; if(Daltype = =daltype.mysql) {byte[] B =New byte[1]; b[0] = (byte) v; Value= System.Text.Encoding.UTF8.GetString (b);//MySQL must be archived in bytes.                                 }                                Else{Value=v.tostring (); }                            }                            Else{Value= value. Replace ("\\","\\\\");//Handling Escape Symbols} SW.                        Write (value); }                        if(I! = dt. Columns.count-1)//not the last one, the output .{SW.                        Write (Appconst.splitchar); }                        #endregion} SW.                WriteLine (); }            }            if(Path.directoryseparatorchar = ='\\') {Path= path. Replace (@"\",@"\\"); }            returnpath; }

The above code is generated by a CSV file, used to be called, there are two cores of pits that cost me a lot of time:

1:bit type data not in?

2: The 1th row of data self-increment ID is reset to 1?

These two problems, the Internet can not find the answer, indulge to today, feel should be solved, and then it solved.

The solution is this:

A: First use load data outfile to export a file, and then use load data infile to import the file.

At first I opened it with a notepad and looked at it, and ctrl+s it, and found that the problem was the same as mine, so I doubted it was not supported?

Until today, re-export, the middle does not look, direct import, found it unexpectedly and normal, so, thinking a turn:

B: The file generated by the file and the command produced by the hexadecimal comparison, the results found:

Bit type of data generated by itself: is 0, 1, in hexadecimal display is 30, 31.

Command generated data in hexadecimal is 00, 01, looked up the information, found that MySQL bit archive bit is binary.

So, the 0,1 in bytes, and then to the string, then archive, just fine.

So a piece of code generated (online DataTable to CSV code is not processed, do not know how they run, there is no definition of bit type?) ):

if (Ms. SqlType = = sqldbtype.bit) {     int v = (value. ToLower () = = "True" | | Value = = "1")? 1:0;     if (Daltype = = Daltype.mysql)     {           byte[] b = new byte[1];           B[0] = (byte) v;           Value = System.Text.Encoding.UTF8.GetString (b);//mysql must be archived in bytes.       }      else       {            value = v.tostring ();       }}

Also about null values, denoted by \ n.

After solving the first question, the second question remains, why is the primary key of the first line code set to 1?

or more than the hexadecimal, the result is amazing discovery:

Is the BOM header, let it mistakenly identify the first primary key value, so it is ignored the primary key, with the 1th self-increment 1 instead.

This also explains why only the data that is being re-saved has a bug.

So, the solution is streawrite time, do not generate BOM head, how to deal with it?

The following code is then available:

The using (StreamWriter SW = new StreamWriter (path, False, new UTF8Encoding (false))) {...}.. ...       }

Replace our regular System.Text.Encoding.UTF8Encoding by using the new one Encoding and specifying the argument as false.

These details are very secret, not to say you can not guess ...

3:oraclesql Grammar Chapter
load[data][{INFILE | INDDN} {File | *}[stream | RECORD | FIXED length [BLOCKSIZE size]| VARIABLE [Length] [{badfile | BADDN} file]{discards | Discardmax} Integr [{INDDN | INFILE} ...] [APPEND | REPLACE | INSERT][reclent integer][{concatenate integer | Continueif {[This | NEXT] (start[: end]) Last}operator {' String ' | X ' Hex '}}]into TABLE [user.] Table[append | replace| Insert][when condition [and condition] ...] [Fields [delimiter]] (Column {Recnum | CONSTANT value | SEQUENCE ({integer | MAX | COUNT} [, increment]) | [POSITION ({start [end] | * [+ integer]})]datatype[terminated [by] {whitespace| [x] ' character '} [[optionally] enclose[by] [x] ' charcter '][nullif condition][defaultif condotion]}[,...])

The above configuration is archived into a CTL file, which is then called by the following command:

C # Syntax article:

. NET there are about three ways to operate Oracle:

1:system.data.oracleclient (requires installation of the client) does not have a batch method (also distinguishes between x86 and x64).

2:oracle.dataaccess (requires installation of the client) with a batch method (also distinguishing between x86 and x64).

3:oracle.manageddataaccess (no client installation required) No batch method (no distinction between x86 and x64, but only. NET 4.0 or higher)

Oracle.dataaccess Belt of the batch method is called: Oraclebulkcopy, because the use of the way and sqlbulkcopy almost consistent, is not introduced.

If the server on which the calling program is located has an Oracle client installed, you can make calls to the following methods:

The process is as follows:

1: Generate *.CVS data file, see the code in MySQL, the same use.

2: Generate the *.ctl control file and archive the generated load Data statement as a *.ctl file.

3: Use Sqlidr.exe to execute the CTL file, here is sad to urge the point is that can not be called by ADO, only with process calls, so, this batch can only be used alone.

To invoke the relevant code for the process:

 BOOLHassqlloader =false; Private BOOLHassqlloader ()//detects if the client is installed. {Hassqlloader=false; Process proc=NewProcess (); Proc. Startinfo.filename="Sqlldr"; Proc. Startinfo.createnowindow=true; Proc. Startinfo.useshellexecute=false; Proc. Startinfo.redirectstandardoutput=true; Proc. Outputdatareceived+=NewDatareceivedeventhandler (proc_outputdatareceived); Proc.            Start (); Proc.            Beginoutputreadline (); Proc.            WaitForExit (); returnHassqlloader; }        voidProc_outputdatareceived (Objectsender, Datareceivedeventargs e) {            if(!Hassqlloader) {Hassqlloader= E.data.startswith ("Sql*loader:"); }        }        //has been implemented, but there is no transaction, so it is not introduced for the time being.         Private BOOLExesqlloader (stringArg) {            Try{Process proc=NewProcess (); Proc. Startinfo.filename="Sqlldr"; Proc. Startinfo.arguments=Arg; Proc.                Start (); Proc.                WaitForExit (); return true; }            Catch            {            }            return false; }
Summarize:

With the popularization of big data, the batch movement between data is inevitably more and more frequently involved, so whether it is with SQL script, or write code yourself, or with dbimport tools, will become one of the necessary skills!

In view of this, share my in this piece of power and filled the pit, for everyone to reference!

Share the Big Data batch import method and programming details of MSSQL, MYSQL, and Oracle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.