Automatically submit Baidu via Python crawl URL URLs

Source: Internet
Author: User

automatically submit Baidu via Python crawl URL URLs

Yesterday, colleagues said, you can manually submit Baidu such index will go up.

And then I thought about it. Should I get a py and then submit it automatically? Thought of it. Or get one of those.

python code is as follows:

Import os  import re  import shutil    reject_filetype  =  ' Rar,7z,css,js,jpg,jpeg,gif,bmp,png,swf,exe '   #定义爬虫过程中不下载的文件类型   def getinfo ( webaddress):    # ' #通过用户输入的网址连接上网络协议, get the URL I'm here for my own domain     global reject_ filetype      url =  ' http//' +webaddress+ '/'    #网址的url地址     print  ' getting>>>>>  ' +url     Websitefilepath = os.path.abspath ('. ') + '/' +webaddress    #通过函数os. Path.abspath get the absolute path of the current program, and then use the URL entered by the user to get the folder where the downloaded pages are stored        if os.path.exists (Websitefilepath):    #如果此文件夹已经存在就将其删除, because if it exists, Then the crawler will not succeed           shutil.rmtree (Websitefilepath)        #shutil. Rmtree function is used to delete a folder (containing files)      outputfilepath =&Nbsp;os.path.abspath ('. ') + '/' + ' output.txt '     #在当前文件夹下创建一个过渡性质的文件output .txt      fobj  = open (Outputfilepath, ' w+ ')       command =  ' wget -r -m  -nv --reject= ' +reject_filetype+ '  -o  ' +outputfilepath+ '   ' +url  # Crawl Web site     tmp0 = os.popen (command) with the wget command. ReadLines ()   # The function Os.popen executes the command and stores the result of the run in the variable tmp0     print >> fobj,tmp0  # Write output.txt in       allinfo = fobj.read ()     target _url = re.compile (R ' \ ". *?\" ', Re. Dotall). FindAll (allinfo)    #通过正则表达式筛选出得到的网址      print  target_url     target_num = len (Target_url)       fobj1 =  open (' Result.txt ', ' W ')       #在本目录下创建一个result. txt file, which stores the resultingContent of     for i in range (target_num):         if len (Target_url[i][1:-1]) <70:   #  This target_url  is a dictionary form, if the URL   length greater than 70  will not be recorded in the inside            print > > fobj1,target_url[i][1:-1]      #写入到文件中          else:            print  "NO"      fobj.close ()       fobj1.close ()        if os.path.exists (Outputfilepath):   #将过渡文件output. txt delete            os.remove (Outputfilepath)    #删除    if __name__== "__ Main__ ":       webaddress = raw_input (" Input the website  address (without \ "http:\ ") >")       getinfo (webaddress)       print   "Well done."

After execution, there will be the following URL

Then get an unsolicited script, I entered the Baidu website to find the address I submitted

Wrote a garbage script, originally wanted to integrate into the py. But think about it, or not.

[Email Protected]i9rkbm4yvq43laz script]# cat baiduurl.sh cd/script && curl-h ' Content-type:text/plain '--data -binary @result. txt "http://data.zz.baidu.com/urls?site=https://www.o2oxy.cn&token=P03781O3s6Ee" && Curl-h ' Content-type:text/plain '--data-binary @result. txt "http://data.zz.baidu.com/urls?site=https:// Www.o2oxy.cn&token=P03781O3s6E "

The results of the implementation are as follows:

[[Email protected] script]# sh baiduurl.sh {"remain": 4993750, "Success": 455}{"remain": 4993295, "Success": 455}

And then did a scheduled task.

Take a look. Getting URL URLs is slow, maybe 10 minutes


Automatically submit Baidu via Python crawl URL URLs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.