1. Demand
The collection task constructs the storage. To achieve the title, time, content and other information of more than 200 Web site collection information configuration, and inserted in MySQL.
2, Implementation Step 1: Manual implementation of Excel table configuration.
Defines a unique index, such as the ordinal of the first column. Benefits:
1) The serial number can be an index in MySQL.
2) The serial number stipulation, can realize the distributed, 1 person 4 hours. Really can achieve 4 people 1 hours to complete the task. (Real distributed)
This is very important.
With regard to indexes, practice has shown that a unique index that defines unique values for each site entry can further prevent conflicts and ensure uniqueness.
Step 2: Deposit the form into txt.
In the Linux environment, the Dos2unix format is converted to ensure that the Utf-8 encoding is not garbled.
Step 3: Script Implementation One-click Construct SQL statement. 3, Foot source code
#!/bin/shP2p_config_file=./base_config.txtone_line=./output/config_line.txt#read Line by lineCat$P 2p_config_file| while ReadLine DoMkdir-p./outputEcho $line>$ONE _line;#echo line= $lineId_01= ' Cat$ONE _line| Awk-f" " ' {print '} ''; Name_02= ' cat$ONE _line| Awk-f" " ' {print $} ''; Url_03= ' cat$ONE _line| Awk-f" " ' {print $} ''; Lstcharset_04= ' cat$ONE _line| Awk-f" " ' {print $4} ''; Concharset_05= ' cat$ONE _line| Awk-f" " ' {print $} ''; Notice_url_06= ' cat$ONE _line| Awk-f" " ' {print $6} '' Titlexpath_07= ' cat$ONE _line| Awk-f" " ' {print $7} '' Timexpath_08= ' cat$ONE _line| Awk-f" " ' {print $8} '' Contentxpath_09= ' cat$ONE _line| Awk-f" " ' {print $9} '' Touch./tmp.txtEcho $titleXpath _07>>./tmp.txtsed-i"s#\" #\\\ ' #g "./tmp.txttitlexpath_07= ' Cat./tmp.txt '#echo $id _01#echo $name _02#echo $url _03echo "INSERT into Test.mdia_config (ID, source_name, Entry_url, List_charset, Content_charset, channel_id, Media_class, site_id, class_id, List_xpath, Title_xpath, Publish_time_xpath, Content_xpath, Click_count_xpath, Comment_count_xpath , Repost_count_xpath, list_js_enabled, content_js_enabled, Last_deliver_time, Deliver_period, weight, proxy_gather, Delete_flag) VALUES ('$id _01', '$name _02', '$notice _url_06', '$lstcharset _04', ' $concharset _05', ' 1 ', ' 1 ', '$id _01', ' 1 ', ' [\ '$titleXpath _07\ '] ', ' ', '$timeXpath _08 ', '$contentXpath _09', ', ', ', ', ' 0 ', ' 0 ', ' 2016-11-19 05:02:11 ', ' All ', ' 0 ', ' 0 ', ' 0 ');Rm- F $ONE _lineRm- F./tmp.txt Done;
Note the point:
1), read by line;
2), for each column of the read, take a loop to store temporary files, and then loop the method of deletion. (learned from colleagues 2 years ago, very effective)
3), pay attention to the early processing of single quotation mark and double quotation mark in SQL, ensure the SQL statement is valid. This, you can go to navicate inside to verify the SQL statement.
4. Summary
Be able to improve the efficiency of the script, resolutely do not manually typed.
Yes, just a few lines, write a loop. Efficiency at ordinary times, efficiency is seen in detail.
Ming Yi World
Reprint please indicate the source, the original address:
http://blog.csdn.net/laoyang360/article/details/53236018
If you feel this article is helpful, please click on the ' top ' support, your support is I insist on writing the most power, thank you!
"Lazy shell script" six--one-click construction of the bulk SQL statement to be collected