電腦視覺經典論文集&此資源批量分類下載的Python程式

最後更新：2018-12-04 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

該資源為德州大學奧斯丁分校電腦視覺領域文獻閱讀課資源，此實驗室在cv領域想必大家都知道的，該網頁論文好多，使用迅雷的批量下載頁不好使啊，自己用python寫了一個多線程批量分類下載的程式，將論文下載到不同的檔案夾並根據類別命名此檔案夾，下載速度在本人這裡最快為6M/S，建議下載前先翻牆(否則有些國外連結會受限制而下不到其網站資源)，最好晚上睡覺前運行程式，然後就可以直接睡覺去啦~
全部論文下載下來大小為600M+，注意保證硬碟有足夠空間哦~

下面將資源分享給大家！
資源網址為：http://www.cs.utexas.edu/~cv-fall2012/schedule.html ，Python源碼貼在下面，注意試用前請保證安裝了BeautifulSoup庫

# -*- coding: utf-8 -*-"""Created on Wed Jan 09 10:33:29 2013@author: lanbing510 lxs"""from bs4 import BeautifulSoupimport reimport urllibimport urlparseimport osimport sysimport threading import tracebackfull_url='http://www.cs.utexas.edu/~cv-fall2012/schedule.html'response=urllib.urlopen(full_url)soup = BeautifulSoup(response.read())#import codecs, sys#old=sys.stdout#sys.stdout = codecs.lookup('utf-8')[-1]( sys.stdout)#print soup.prettify()down_count=0def downLoad(url,path):    def cbk(a, b, c):                """回呼函數         @a: 已經下載的資料區塊         @b: 資料區塊的大小         @c: 遠程檔案的大小         """        per = 100.0 * a * b / c          if per>100:              per=100              #print '%.2f%%' % per      urllib.urlretrieve(url,path,cbk)    global down_count    down_count+=1;    print path.split("\\")[-1],'has download'    print 'have finished %d files ^_^' % down_count    """測試downLoad函數用"""#url='http://www.sina.com.cn'  #local='d:\\sina.html'  #downLoad(url,local)    def main():    #f=open('path.txt','w+')#測試用    threads=[]#線程池    local_path='.\\lan\\'#根目錄    c1_path=''#一級目錄    c2_path=''#二級目錄    count=0;c=0;#for test    for sibling in soup.tr.next_siblings:        if sibling!='\n':            #count+=1            #print "[%d]: %s" %(count,repr(sibling))            #print type(sibling)            sibstr=repr(sibling)            if (re.search('rgb\(204, 204, 255\)',sibstr))!=None:                """注意括弧的正則"""                #c+=1                print sibling.a['name']                c1_path=local_path+repr(sibling.a['name'])                if os.path.exists(c1_path)==False:                    os.mkdir(c1_path)                #continue            slist=sibling.find_all('a')            if slist!=[]:                try :                    c2_path=c1_path+'\\'+repr(slist[0]['name'])                except KeyError:                    continue                #print c2_path                c2_path=c1_path+'\\'+repr(slist[0]['name'])                if os.path.exists(c2_path)==False:                    os.mkdir(c2_path)                                                    for li in slist[1:]:                    temp_url=li.get('href')                    count+=1                    if temp_url!=None:                         patt='http.+|ftp.+|www.+'                        if re.match(patt,temp_url)==None:                            temp_url=r'http://www.cs.utexas.edu/~cv-fall2012/'+temp_url#處理本伺服器上的檔案                        url_split_temp=temp_url.split('/')                        url_sp=url_split_temp[-1]if url_split_temp[-1]!='' else (url_split_temp[-2]+'.html')                        patt2='\.pdf$|\.html$|\.ppt$|\.doc$|\.docx$|\.pptx$|\.rar$|\.htm$|\.gz$|\.xml$'                        if re.search(patt2,url_sp)==None:                            url_sp=url_sp+'.html'                        temp_path=c2_path+'\\'+url_sp                    else:continue                   # print >>f,temp_url,'\n'                   # print >>f,temp_path,'\n'                #print count    #print 'c=',c                    t=threading.Thread(target=downLoad,args=(temp_url,temp_path))                    threads.append(t)                    t.start()    #f.close()if __name__=='__main__':        fe=open("error.txt",'w')#except的資訊    sys.stderr=fe    try:        main()    finally:        fe.close()        sys.stderr=sys.stdout

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

電腦視覺經典論文集&此資源批量分類下載的Python程式

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support