If you want to see the story, you will download the e-book that contains the example file "Shu Shan Jian Xia Chuan .txt.
However, after reading this article, I felt that the file was quite large, and the e-book loading was quite slow. I didn't try to split it into different versions, So I thought about how to split it into different files.
It's nothing more than reading the file, matching the regular expression, and splitting the file.
Coding thinks this method must be slow. It is better to capture it from the online reading area. So I found the "Shu Shan Jian Xia Chuan --- still zhuzhu --- Tianya online library" and changed the file segmentation problem to the screen capturing problem.
Code:
From urllib import urlopen
Import re
TitleRe = re. compile ('(? <= "Biaoti">). +? (? = </Span> )')
ContentRe = re. compile ("(? <= 'Content'>). +? (? = </Td>) ", re. DOTALL)
DirPath = 'f: \ shushanjianxiazhuan \\'
UrlPath = 'HTTP: // www.tiany1_k.com/wuxia/huanzhulouzhu/shushanjianxiazhuan /'
For x in xrange (1,310 ):
X = str (x)
Url = urlPath + x + '.htm'
Page = urlopen (url). read ()
Title = titleRe. search (page). group ()
Content = contentRe. search (page). group ()
Content = content. replace ('<BR>', '\ n ')
F = file(dirpath%x%title%'.txt ', 'w ')
F. write (title + '\ n' + content)
F. close ()
Print title
Zi zaichuan said: "Shu Shan" is a super, super, romantic and magnificent work, but unfortunately I had been born for two thousand years.