When I was bored last night, I tried to practice python, so I wrote a little reptile to get a knife. Update data in the entertainment network
[Python]View PlainCopy
- #!/usr/bin/python
- # Coding:utf-8
- Import Urllib.request
- Import re
- #定义一个获取网页源码的子程序
- Head = "www.xiaodao.la"
- Def get ():
- data = Urllib.request.urlopen (' http://www.xiaodao.la '). Read ()
- #解码并去除无用文字
- str = Data.decode ("GBK"). Replace (R"Font-weight:bold;", ""). Replace (R " ," "). Replace (" "," "). Replace (" "," "). Replace (" \ r \ n",""). Replace ("#FF0000 "," #000000 "). Strip ()
- return Str[str.find ("good Card Sale"): Str.find ("20160303184868786878.gif")]#返回指定内容
- #获取一次网页源码并赋值给str
- str = GET ();
- #print (str)
- #定义正则表达式
- #reg = R ' href= "(. *?)" style= "COLOR: #000000;" Title= "(. *?)" target= "_blank" > "
- Reg = R' href= "(. *?)" style= "COLOR: #000000;" Title= "(. *?)" target= "_blank" > (. *?) </a></div></td><tdwidth=12.5%align=rightnowrap=nowrapstyle= "COLOR: #F00;" > (. *?) </td> '
- TMP = Re.compile (reg); #创建正则表达式
- List = Re.findall (TMP,STR); #正则表达式匹配
- List = tuple (list)#转换类型
- Print ("total matches%d"% (len (list))#输出匹配数量
- #print (list)
- For I in range (len (list)):
- print ("current%d:"% (i+1))
- print ("title:%s\n Address:%s Update time:%s\n"% (list[i][1],head + list[i][0],list[i][3]))
Python implements a simple crawler to get updated data for a web of knives