1. Background
Project requirements: Obtain the github repo api to extract repo data for analysis. After studying for a day, I finally solved this problem, although the efficiency is still relatively low.
The repo display api on github lists the details of each repo and is in json format. It seems that no method can be found to analyze data in multiple json formats, so it is silly to use the splite + re method. If you have a better method, leave a message to discuss it!
2. Code
import reimport osdef GetUrl(num): str = os.popen("curl -G https://api.github.com/repositories?since=%d"%(num)).read() pattern = '"url"' pattern1='repos' urls=str.split(',\n') for i in urls: if pattern in i and pattern1 in i: # text1=i.splite(':') text=re.compile('"(.*?)"').findall(i)[1] print textif __name__=='__main__': GetUrl(1000)
Here, the num value refers to the page id. We can create a loop and increase the num value to infinitely extract the repo. Because the github api imposes limits on traffic, this is a feasible method.
The result is as follows (the extracted repo api address ):
Https://api.github.com/repos/wycats/merb-core
Https://api.github.com/repos/rubinius/rubinius
Https://api.github.com/repos/mojombo/god
Https://api.github.com/repos/vanpelt/jsawesome
Https://api.github.com/repos/wycats/jspec
Https://api.github.com/repos/defunkt/exception_logger
Https://api.github.com/repos/defunkt/ambition
Https://api.github.com/repos/technoweenie/restful-authentication
Https://api.github.com/repos/technoweenie/attachment_fu
Https://api.github.com/repos/topfunky/bong
Https://api.github.com/repos/Caged/microsis
Https://api.github.com/repos/anotherjesse/s3
Https://api.github.com/repos/anotherjesse/taboo
Https://api.github.com/repos/anotherjesse/foxtracs
Https://api.github.com/repos/anotherjesse/fotomatic
Https://api.github.com/repos/mojombo/glowstick
Https://api.github.com/repos/defunkt/starling
Https://api.github.com/repos/wycats/merb-more
Https://api.github.com/repos/macournoyer/thin
Https://api.github.com/repos/jamesgolick/resource_controller
Https://api.github.com/repos/jamesgolick/markaby
Https://api.github.com/repos/jamesgolick/enum_field
Https://api.github.com/repos/defunkt/subtlety
Https://api.github.com/repos/defunkt/zippy
Https://api.github.com/repos/defunkt/cache_fu
Https://api.github.com/repos/KirinDave/phosphor