1. Background
The project needs to get GitHub's repo API so that the repo data can be extracted for analysis. The study of the day, finally solved the problem, although the efficiency is still relatively low.
Because the GitHub API, which shows repo, lists the details of each repo and is in JSON format. There seems to be no way to parse multiple JSON-formatted data, so it's a stupid way to Splite plus re. If you have a better way, do not send a message to discuss!
2. Code
Import reimport osdef GetUrl (num): str = Os.popen ("curl-g https://api.github.com/repositories?since=%d"% (num)). Read () pattern = ' url ' ' pattern1= ' repos ' urls=str.split (', \ n ') for i in URLs: if pattern in I and pattern1 in I: #
text1=i.splite (': ') text=re.compile (' "(. *?)"). FindAll (i) [1] print textif __name__== ' __main__ ': GETURL (1000)
Where the value of NUM refers to the ID of the page, we can do a loop, and constantly increase the value of num, we can extract repo indefinitely. Because GitHub's API is limited to traffic, doing so is a viable approach.
The effect is as follows (the API address of the extracted repo):
Https://api.github.com/repos/wycats/merb-core
Https://api.github.com/repos/rubinius/rubinius
Https://api.github.com/repos/mojombo/god
Https://api.github.com/repos/vanpelt/jsawesome
Https://api.github.com/repos/wycats/jspec
Https://api.github.com/repos/defunkt/exception_logger
Https://api.github.com/repos/defunkt/ambition
Https://api.github.com/repos/technoweenie/restful-authentication
Https://api.github.com/repos/technoweenie/attachment_fu
Https://api.github.com/repos/topfunky/bong
Https://api.github.com/repos/Caged/microsis
Https://api.github.com/repos/anotherjesse/s3
Https://api.github.com/repos/anotherjesse/taboo
Https://api.github.com/repos/anotherjesse/foxtracs
Https://api.github.com/repos/anotherjesse/fotomatic
Https://api.github.com/repos/mojombo/glowstick
Https://api.github.com/repos/defunkt/starling
Https://api.github.com/repos/wycats/merb-more
Https://api.github.com/repos/macournoyer/thin
Https://api.github.com/repos/jamesgolick/resource_controller
Https://api.github.com/repos/jamesgolick/markaby
Https://api.github.com/repos/jamesgolick/enum_field
Https://api.github.com/repos/defunkt/subtlety
Https://api.github.com/repos/defunkt/zippy
Https://api.github.com/repos/defunkt/cache_fu
Https://api.github.com/repos/KirinDave/phosphor