Some tips for writing scripts-----------------------migrating historical data to a sub-table

Source: Internet
Author: User

Historical data migrated to the sub-table. (previously single table dozens of G table, need to do optimization sub-table)

Background:

The previous project was using MSSQL for historical reasons, with a large number of log tables that needed to be fully migrated to the MySQL table by date sub-list. Due to the large amount of data and some defects in the design of the table, query efficiency and execution efficiency should be considered when migrating.

I am using the following method:

Each table has a function to handle. In fact, the data should be written to reduce the incoming parameters.
I wrote the start time and the number of days to pass in, and then based on the time stamp to calculate a continuous time string, used to generate the corresponding day table surface of the string.
In fact, later think, can be passed 2 time period, the end of the time can not pass, the default is to execute the script the day.
The code is as follows:

 Public functionMovegameclicklogtomysql () {//querying data based on migration date (from MSSQL)    $startDate=$this->input->get_post (' StartTime '); if(!$startDay)Exit(' Please enter a start date '); $endDate=$this->input->get_post (' EndTime '); $dateArray=$this->getsqedate ($startDate,$endDate); $upTime=$this->input->get_post (' UpTime ');//online time, time stamp    $upDate=Date(' Ymd ',$upTime); //Tgameclicklog    Set_time_limit(0); $searchDate=Array(); foreach($dateArray  as $d) {        Array_push($searchDate,Date(' Y-m-d h:i:s ',Strtotime($d))); }    $mysql=$this->load->database (' Mysqllog ',true); foreach($searchDate  as $day) {        $startTime=$day; $endTime=Date(' Y-m-d 23:59:59 ',Strtotime($day) + 1); $sqlStart= "Select min (id) as start_id from Someclick where dtinsert>= ' {$startTime}‘";//"and dtinsert<= ' {$endTime} '";        $sqlEnd= "Select min (id) as end_id from Tgameclicklog where dtinsert>= ' {$endTime}‘"; $startRes=$this->mssql->getone ($sqlStart); $endRes=$this->mssql->getone ($sqlEnd); if($startRes&&$endRes) {            $end _id=$endRes[' end_id ']-1; $start _id=$startRes[' start_id ']; Echo"Coming in <br/>"; $ONEDAYSQL= "SELECT * from Tgameclicklog where ID >= {$startRes[' start_id ']} and ID <= {$end _id} "; $COUNTSQL= "SELECT count (ID) as Count from Someclick where ID >= {$startRes[' start_id ']} and ID <= {$end _id}"; $count=$this->mssql->getone ($COUNTSQL); $countNum=$count[' Count ']; //var_dump ($count);d ie (); Every 10,000 queries            $times=Ceil($countNum/10000); //inserting MySQL            $d=Date(' Ymd ',Strtotime($day)); $tableName= ' Game_click_log_ '.$d; //clear the original table data first            $where= ' '; if($upDate==$d)$where= ' Where created <= '.$upTime; $mysql->query (' delete from ').$tableName.$where);  for($i= 1;$i<=$times;$i++) {                $insertData=Array(); //Batch Query and insert                $partData=$this->mssql->select ($ONEDAYSQL,Array(),$i, 10000); foreach($partData  as $p) {                    //Insert time in the original table for special handling                    $created= (Array)$p[' Dtinsert ']; $created=Strtotime($created[' Date ']); $insertData[] =Array(                        ' netbar_id ' =$p[' Iuserid '], ' game_id ' =$p[' Igameid '], ' click ' + =$p[' Iclick '],//some other colums                    ); }                $allData=Array_chunk($insertData, 1000);//Insert in Batches                $num=Count($allData);  for($n= 0;$n<$num;$n++) {                    $mysql->insert_batch ($tableName,$allData[$n]); }            }        } Else {            Continue; }        Var_dump(' Finish insert '.$d. "Data"); }}


The whole code actually summarizes the steps in general:
1. Processing start time and end time, Sir into a continuous time string array, such as I passed in 2015-11-12, the second parameter end date does not pass.
Then through the Getsqedate function you can generate
Array
' 2015-11-12 ',
' 2015-11-13 ',
' 2015-11-14 ',
.........
)
2. Consider emptying the original data for the data after you have taken the script to perform the migration on-line. Because the back will go to MySQL, so do not read on-line after the MSSQL table data, to prevent duplication of redundant data.

3. Because the data volume is too large, and the MSSQL date is the string is not indexed, so it is not possible to use the date to do the query directly out of a day's data. Curve to save the nation, based on the start time and end time to obtain the largest primary key ID and the smallest primary key ID, and then according to the two ID as a condition to query the full day of data.

You see here is not a sigh of relief, and then eggs. Because of the large amount of data, such as game click Log, a day of record on billions, disposable and disposable insertion is difficult. So it has to be removed in batches and inserted into MySQL in batches.

The key code for the batch query is as follows:

$count=$this->mssql->getone ($COUNTSQL);$countNum=$count[' Count '];//var_dump ($count);d ie ();//10,000 queries per$times=Ceil($countNum/10000); for($i= 1;$i<=$times;$i++) {$insertData=Array();//Batch Query and insert$partData=$this->mssql->select ($ONEDAYSQL,Array(),$i, 10000);
Some other codes ....$allData=Array_chunk($insertData, 1000);//Insert in Batches$num=Count($allData); for($n= 0;$n<$num;$n++) {$mysql->insert_batch ($tableName,$allData[$n]);}//Other codes}

The message is output after each day's run.

In fact, there are some log records, such as the use of error_log, similar to:

Error_log (Date (' h:i:s '). "Sync stats to MySQL game click Table gameclick_{$date}_log \ T", 3, "./application/logs/syncgameclick.log");

Use this as some voucher.

Some tips for writing scripts-----------------------migrating historical data to a sub-table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.