First paste out the code, the number of rows is less, just use regular expressions to analyze the title of the topic and the title of the label and the topic title extracted
1 Importurllib.request2 ImportRe3 Importdbm4 #define URL, where%d is used to replace page numbers5URL ='http://acm.zju.edu.cn/onlinejudge/showProblems.do?contestId=1&pageNumber=%d'6 #Connect the Persistence dictionary, where the created method ' C ' is used to create the write7db = Dbm.open ('zoj_list','C')8 9 forIndexinchRange (1, 30):TenThis_url = URL% (index)#Replace the number representing the page number in the URL Onehtml = Urllib.request.urlopen (This_url). Read ()#The Read method reads the page HTML Ahtml = Html.decode ('Utf-8')#UTF-8 Code, no this sentence will prompt the error -title = Re.compile ('<font color= "Blue" >.*</font>')#Regular, compiled -Key ="' theCNT = 1 - forXinchTitle.findall (HTML): -Title_parse = Re.compile ('<[^>]+>')#Remove the regular label -get = Title_parse.sub ("', x)#Remove Label + ifCNT% 2 = =0: -Value =Get +Db[key] =value A Else: atKey =Get -CNT + = 1
This saves the key and value through a dictionary to the persistent dictionary ' zoj_list '.
The code to read this dictionary is as follows:
1 Importdbm2db = Dbm.open ('zoj_list','R')3 Print(db['1001'])4 forIndexinchRange (1001, 2000):5In_ch =Str (index)6 Print(Db[in_ch])
Use Python to get the names of all Zoj topics