School work needs, need to set up an intranet OJ server, the use of open source Hustoj. The question was downloaded from the Hustoj freeprblem XML file. There are many errors when importing, for unknown reasons. In addition to the calendar year Noip to add to the test, but the Noip in the past years the XML file only 3, 4. Cogs on almost all of the calendar year Noip then thought of using python+pyquery to crawl into XML. As for not choosing BeautifulSoup and choosing pyquery is feeling PQ syntax close to jquery, it is more convenient to use, and the speed may be faster!
ver0.9 has been completed, but due to the COGS format is not unified, their own experience, found a lot of errors, pending further improvement!
Ver1.0 intends to rectify these errors and try to make the questions crawl as correctly as possible. Data fetching can be considered later, import problem
Python crawls online OJ questions