1. Background
The project requirements require access to GitHub's repo APIs so that repo data can be extracted for analysis. Research for a day, finally solved the problem, although the efficiency is still relatively low.
Because GitHub's API, which displays repo, lists the details of each repo and is in JSON format. It seems that you haven't found a way to parse multiple JSON format data, so you're using a more splite method than adding re. If you have a better way, do not send a message to discuss!
2. Code
Import re
import OS
def GetUrl (num):
str = os.popen ("Curl-g https://api.github.com/repositories?since=%d % (num)). Read () pattern
= ' url ' '
pattern1= ' repos '
urls=str.split (', \ n ') for the
i in URL:
if Pattern in I and pattern1 in I:
# Text1=i.splite (': ')
text=re.compile (' (. *?) '). FindAll (i) [1]
print text
if __name__== ' __main__ ':
GetUrl (1000)
The value of NUM refers to the ID of the page, we can do a loop, and constantly increase the value of num, you can extract repo indefinitely. Because the API for GitHub is limited to traffic, doing so is a feasible approach.
The effect is as follows (extracted from the Repo API address):
Https://api.github.com/repos/wycats/merb-core
Https://api.github.com/repos/rubinius/rubinius
Https://api.github.com/repos/mojombo/god
Https://api.github.com/repos/vanpelt/jsawesome
Https://api.github.com/repos/wycats/jspec
Https://api.github.com/repos/defunkt/exception_logger
Https://api.github.com/repos/defunkt/ambition
Https://api.github.com/repos/technoweenie/restful-authentication
Https://api.github.com/repos/technoweenie/attachment_fu
Https://api.github.com/repos/topfunky/bong
Https://api.github.com/repos/Caged/microsis
Https://api.github.com/repos/anotherjesse/s3
Https://api.github.com/repos/anotherjesse/taboo
Https://api.github.com/repos/anotherjesse/foxtracs
Https://api.github.com/repos/anotherjesse/fotomatic
Https://api.github.com/repos/mojombo/glowstick
Https://api.github.com/repos/defunkt/starling
Https://api.github.com/repos/wycats/merb-more
Https://api.github.com/repos/macournoyer/thin
Https://api.github.com/repos/jamesgolick/resource_controller
Https://api.github.com/repos/jamesgolick/markaby
Https://api.github.com/repos/jamesgolick/enum_field
Https://api.github.com/repos/defunkt/subtlety
Https://api.github.com/repos/defunkt/zippy
Https://api.github.com/repos/defunkt/cache_fu
Https://api.github.com/repos/KirinDave/phosphor