Python uses the experience to obtain the GitHub code base list _python

Source: Internet
Author: User

1. Background

The project requirements require access to GitHub's repo APIs so that repo data can be extracted for analysis. Research for a day, finally solved the problem, although the efficiency is still relatively low.

Because GitHub's API, which displays repo, lists the details of each repo and is in JSON format. It seems that you haven't found a way to parse multiple JSON format data, so you're using a more splite method than adding re. If you have a better way, do not send a message to discuss!

2. Code

Import re
import OS

def GetUrl (num):
 str = os.popen ("Curl-g https://api.github.com/repositories?since=%d % (num)). Read () pattern
 = ' url ' '
 pattern1= ' repos '
 urls=str.split (', \ n ') for the  
 i in URL:
  if Pattern in I and pattern1 in I:   
#  Text1=i.splite (': ')
  text=re.compile (' (. *?) '). FindAll (i) [1]
  print text
if __name__== ' __main__ ':
 GetUrl (1000)

The value of NUM refers to the ID of the page, we can do a loop, and constantly increase the value of num, you can extract repo indefinitely. Because the API for GitHub is limited to traffic, doing so is a feasible approach.

The effect is as follows (extracted from the Repo API address):

Https://api.github.com/repos/wycats/merb-core

Https://api.github.com/repos/rubinius/rubinius

Https://api.github.com/repos/mojombo/god

Https://api.github.com/repos/vanpelt/jsawesome

Https://api.github.com/repos/wycats/jspec

Https://api.github.com/repos/defunkt/exception_logger

Https://api.github.com/repos/defunkt/ambition

Https://api.github.com/repos/technoweenie/restful-authentication

Https://api.github.com/repos/technoweenie/attachment_fu

Https://api.github.com/repos/topfunky/bong

Https://api.github.com/repos/Caged/microsis

Https://api.github.com/repos/anotherjesse/s3

Https://api.github.com/repos/anotherjesse/taboo

Https://api.github.com/repos/anotherjesse/foxtracs

Https://api.github.com/repos/anotherjesse/fotomatic

Https://api.github.com/repos/mojombo/glowstick

Https://api.github.com/repos/defunkt/starling

Https://api.github.com/repos/wycats/merb-more

Https://api.github.com/repos/macournoyer/thin

Https://api.github.com/repos/jamesgolick/resource_controller

Https://api.github.com/repos/jamesgolick/markaby

Https://api.github.com/repos/jamesgolick/enum_field

Https://api.github.com/repos/defunkt/subtlety

Https://api.github.com/repos/defunkt/zippy

Https://api.github.com/repos/defunkt/cache_fu

Https://api.github.com/repos/KirinDave/phosphor

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.